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- -y>  The  purpose  of  this  study  is  to  find  or  develop  some 

method  for  evaluating  and  measuring  the  performance  of 
aircraft  maintenance  technicians  in  the  United  States  Air 
Force.  This  evaluation  method  is  to  be  used  in  another 
research  effort  to  develop  a  model  or  models  for  predicting 
or  evaluating  the  effectiveness  of  maintenance  technician 
performance . 

The  performance  appraisal  method  developed  in  this 
study  is  based  on  a  review  of  the  literature  on  the  subject. 

A  literature  review  has  been  necessary,  as  existing  appraisal 
methods  either  are  not  applicable  to  statistical  analysis, 
are  highly  inflated,  or  provide  incomplete  and  non-current 
coverage  of  maintenance  organizations.  The  performance 
acpraisal  method  developed  relies  on  subjective  supervisor 
appraisals  of  maintenance  technician  quantity  and  quality 
of  performance. 

An  evaluation  of  the  performance  appraisal  method 
has  been  conducted  within  the  aircraft  maintenance  organization 
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of  one  pilot  training  base.  The  random  sample  consists  of 
20'  of  the  assigned  technicians.  Thirty-six  supervisory 
groups  of  five  or  fewer  technicians  per  group  have  been 
selected  and  found  to  represent  the  organization  as  a  whole 
in  terras  of  experience  and  relative  manning.  Quality  of 
performance  ratings  have  a  mean  value  of  7.2  (median  of  8.0) 
on  a  10.0  scale,  while  quantity  of  performance  ratings  have 
a  mean  value  of  6.6  (median  of  7.0). 

The  quality  of  performance  data  shows  only  marginal 
correlation  with  existing  personnel  inspection  data.  The 
performance  ratines  as  a  -whole,  however,  display  superior 
face  validity  and  usefulness  compared  to  existing  personnel 
inspection  data.  — 
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A3STRACT 


Joel  R.  Hickman,  Arizona  State  University,  December, 
1979,  Air  Force  Maintenance  Technician  Performance  Measure¬ 
ment,  Major  Professor:  Hewitt  H.  Young,  Ph.D. 

The  purpose  of  this  study  was  to  find  or  develop  some 
method  for  evaluating  and  measuring  the  performance  of 
aircraft  maintenance  technicians  in  the  United  States  Air 
Force.  This  evaluation  method  was  then  to  be  used  in  another 
research  effort  to  develop  a  model  or  models  for  predicting 
or  evaluating  the  effectiveness  of  maintenance  technician 
performance. 

The  performance  appraisal  method  developed  in  this 
study  was  based  on  a  review  of  the  literature  on  the  subject, 
A  literature  review  was  necessary,  as  existing  appraisal 
methods  either  were  not  applicable  to  statistical  analysis, 
were  highly  inflated,  or  provided  incomplete  and  non-current 
coverage  of  maintenance  organizations.  The  performance 
appraisal  method  developed  relied  on  subjective  supervisor 
appraisals  of  maintenance  technician  quantity  and  quality 
of  performance. 

An  evaluation  of  the  performance  appraisal  method  was 
conducted  within  the  aircraft  maintenance  organization  of 
one  pilot  training  base.  The  random  sample  for  the  evalua¬ 
tion  consisted  of  20 %  of  the  assigned  technicians.  Thirty- 

six  supervisory  groups  of  five  or  fewer  technicians  per 
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group  were  selected  and  found  to  represent  the  organization 
as  a  v/hole  in  terms  of  experience  and  relative  branch  man¬ 
ning.  The  resultant  quality  of  performance  ratings  had  a 
mean  value  of  7.2  (median  of  6.0)  on  a  10.0  scale,  while 
quantity  of  performance  ratings  had  a  mean  value  of  6.6 
(median  of  7.0).  These  skewed  results  presented  potential 
difficulties  for  regression  modeling  and  for  the  comparison 
of  distributions.  However,  these  difficulties  were  overcome 
for  regression  modeling,  while  the  quantity  and  quality 
distributions  were  found  to  be  significantly  different. 

The  quality  of  performance  data  showed  only  marginal 
correlation  with  existing  personnel  inspection  data.  In 
addition,  the  use  of  numbered  gradations  on  the  performance 
appraisal  scales  resulted  in  performance  histograms  which 
were  not  useable  in  most  non-parametric  tests  and  which 
reduced  the  power  of  parametric  tests  for  comparisons.  The 
performance  ratings  as  a  whole,  however,  displayed  superior 
face  validity  and  usefulness  compared  to  existing  personnel 
insoection  data. 
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permission,  the  findings  and  conclusions  expressed  are  those 
of  the  author,  and  are  not  to  be  construed  as  official  or 
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Chapter  1 


INTRODUCTION 

One  of  the  greatest  needs  of  managers  of  the  military 
weapons  system  maintenance  complex  is  to  measure  accurately 
how  well  individuals  perform  on  the  job.  Individual  job 
performance  forms  one  of  the  bases  fcr  performance  by  the 
entire  organization.  If  the  effectiveness  of  -weapons  system 
maintenance  is  to  be  improved,  then  individual  performance 
must  also  be  measurable  and  subject  zo  improvement.  As 
stated  by  Cummings  and  Schwab  (1973:56),  in  general  "the 
measurement  and  assessment  of  human  performance  is  crucial 
to  effective  utilization  in  order  to  provide  the  basis  for 
feedback  into  the  input-processing  and  input-conversion 
stages..."  of  the  organizational  control  process. 

Quantifying  job  effectiveness  is,  however,  difficult. 
Campbell  et  al.  (1970:101 )  feel  that  "Quantifying  job 
effectiveness  has  been  industrial  psychology’s  major  bugaboo 
since  its  inceotion."  Decades  of  research  by  psychologists 
and  personnel  experts  have  failed  to  provide  definitive 
answers  to  the  question  of  how  to  measure  performance  or 
effectiveness.  Air  Force  Manual  66-1  (AFM  66-1),  Volume  I, 
Maintenance  Management  (1975:A3-2),  allows  that  the  measures 
of  personnel  performance  form  the  basis  for  capability 
predictions.  These  measures  are,  however,  difficult  to 


assess  and  subject  to  a  number  of  variables.  As  a 
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substitute  for  personnel  performance  measures,  overall 
maintenance  support  to  the  unit's  mission  is  assessed.  Such 
an  approach  is  justifiable  given  the  thirty  thousand  tasks 
reported  by  Wiley  (1978s5)  that  Air  Force  maintenance  per¬ 
forms.  Existing  official  supervisor  ratings  (e.g..  Airmen 
Performance  Ratings)  do  not  serve  the  performance  measure¬ 
ment  purpose  either  as  they  are  general  in  nature,  are  not 
specifically  related  to  tasks  and  jobs,  are  highly  inflated, 
and  do  not  discriminate  among  individuals. 

This  study  considers  the  available  rating  techniques, 
recommends  a  particular  rating  technique,  and  reports  on  a 
test  of  the  recommended  technique.  Chapter  2  will  discuss 
performance  and  will  conclude  with  a  suggested  rating  scheme. 
Test  methodology  will  be  provided  in  Chapter  3,  Chapter  4 
will  report  on  the  analysis  of  test  results,  and  Chapters 
5  and  6  will  contain  an  interpretation  and  a  summary  of  the 
test  results. 


Purpose 


The  purpose  of  this  study  is  to  find  or  develop  a 
method  for  evaluating  and  measuring  the  performance  of 
aircraft  maintenance  technicians  in  the  United  States  Air 
Force.  This  evaluation  method  will  ultimately  be  used  as  a 
performance  measure  of  maintenance  manpower  effectiveness  in 
a  research  effort  to  develop  a  model  or  models  for  predict¬ 
ing  and  evaluating  the  effectiveness  of  maintenance  techni¬ 
cian  performance  (see  Young?1976jl5) . 
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The  performance  rating  used  must  involve  minimum 
development  time  and  cost.  These  limitations  restrict  the 
approaches  that  can  be  used.  The  primary  approach  used  here 
is  a  review  of  published  material  dealing  with  performance, 
with  an  emphasis  on  previous  studies  of  Air  Force  mainte¬ 
nance  activities.  The  recommended  performance  rating 
method  will  then  be  tested. 

Besides  cost  and  time  restrictions,  any  performance 
evaluation  method  should  meet  the  following  criteria: 

1.  Be  useful  fcr  describing  performance  to  manage¬ 
ment. 

2.  Be  valid  as  a  measurement  of  maintenance  tech¬ 
nician  performance. 

3.  3e  applicable  to  different  tyres  of  performance 
tasics,  such  as  repair,  service,  and  preventive  main¬ 
tenance  . 

4.  3e  applicable  to  both  military  and  civilian  em¬ 
ployees  of  the  Air  Force, 

5.  Provide  a  performance  measure  throughout  the 
many  levels  of  weapon  systems  maintenance. 

6.  Provide  valid  information  for  statistical  analysis 
in  the  form  of  normal  performance  distributions  with 
constant  variance. 

These  objectives  impose  severe  restrictions  on  any 
possible  measurement  system.  However,  satisfying  such 
restrictions  is  imperative  if  any  research  effort  is  to 
provide  an  accurate  analysis  of  the  motivation  and  ability 

_ _ _ _  -r-  '  '<•  J 


factors  affecting  performance.  As  Guion  (1965*90)  has 
stated,  "interest  should  be  focused  upon  what  is  to  be 
predicted." 

In  short,  the  purpose  of  this  study  is  to  answer  the 
following  questions j 

1.  What  is  the  best  research  method  for  evaluating 
or  measuring  performance  of  aircraft  maintenance 
technicians  in  the  United  States  Air  Force? 

2.  Does  this  method  for  evaluating  or  measuring 
performance  orovide  useful  and  valid  statistical 


Chapter  2 


THEORETICAL  PERFORMANCE  MEASURES 

This  chapter  deals  with  the  available  methods  for 
evaluating  and  measuring  performance  based  on  a  review  of 
the  literature.  The  following  considerations  will  be 
discussed i 

1.  Organization  structure. 

2.  Quality  of  ratings. 

3.  Performance  criteria. 

A.  Appraisal  methods. 

5.  Ratine  scale  errors. 

6.  Scale  format. 

7.  The  raters. 

A  suggested  rating  scheme  based  on  the  above  considerations 
will  be  provided  at  the  conclusion  of  Chapter  2. 

Organization  Structure 

The  Air  Force  maintenance  structure  involves  thou¬ 
sands  of  personnel  performing  a  vast  variety  of  functions. 
Thus,  any  performance  measure  must  be  applicable  to  differ¬ 
ent  organizational  levels.  This  is  a  difficult  requirement 
to  satisfy,  as  McDonnell  (1979)  reports  that  there  are 
forty-five  thousand  Air  Force  members  in  the  aircraft 
maintenance  field  alone. 

Maintenance  is  concerned  with  aircraft  and  missiles 
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and  is  performed  by  military  or  civilian  technicians  of  both 
sexes.  The  three  overall  levels  of  maintenance  organization 
are  known  as  base  or  organizational,  intermediate,  and 
depot.  Base  level  maintenance  consists  of  inspecting, 
servicing,  and  replacing  parts.  Intermediate  level  mainte¬ 
nance  is  often  indistinguishable  from  base  level  maintenance 
and  consists  of  calibrating  or  replacing  damaged  or  unserv¬ 
iceable  parts,  of  modifying  material,  and  of  emergency 
manufacturing  of  unavailable  parts.  Depot  level  maintenance 
augments  stocks  of  serviceable  material  with  more  extensive 
shop  facilities  and  personnel  of  higher  technical  skill 
level  (usually  civilian  employees).  Although  the  present 
research  will  include  only  base  level  organizations,  provi¬ 
sion  must  be  included  for  making  the  proposed  performance 
measurement  technique  applicable  to  all  levels  for  further 
evaluation. 

Further  generality  of  the  rating  technique  is  man¬ 
dated  by  the  varied  tasks  performed  by  a  base  level  mainte¬ 
nance  organization.  A  typical  Air  Force  base  with  a  mission 
involving  aircraft  might  include  field  maintenance  (FMS), 
organizational  maintenance  (OKS),  avionics  maintenance  (AMS), 
and  munitions  maintenance  (KMS)  squadrons  (see  Figures  1,2, 
3,^,  and  5)»  Keister,  Finley,  and  Thompson  (1971).  Foley 
(197^),  and  Wiley  (1^73)  have  considered  automatic  flight 
control  maintenance  performance  in  the  AMS  alone,  while 
Sauer,  Campbell,  and  Potter  (1977)  dealt  with  Short  Range 
Attack  Missile  maintenance  in  the  MMS  alone.  Enlarging  the 
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scope  of  a  performance  measurement  tool  to  include  repair, 
fabrication,  and  preventive  maintenance  personnel  as  -veil 
as  flightline  launch  and  recovery  personnel,  requires  either 
generalized  rating  scales  applicable  to  many  technician 
specialties,  or  specific,  noncomparable  measures  for  each 
specialty.  Separate  measures  would,  however,  make  any 
analysis  of  overall  performance  within  a  squadron  impossible. 

The  nature  of  the  maintenance  organization  strongly 
favors  the  use  of  general  individual  performance  measures. 
Such  measures  would  be  applicable  to  the  varied  tasks  and 
functions  for  which  the  different  technicians  are  responsi¬ 
ble..  Since  most  maintenance  is  performed  by  teams  of  five  to 
ten  technicians  working  under  one  supervisor,  the  supervisor 
could  evaluate  his  personnel  if  a  general,  subjective  per¬ 
formance  measure  were  to  be  used.  Thus  due  to  tne  structure, 
size,  and  complexity  of  the  Air  Force  maintenance  system, 
the  present  research  effort  must  use  a  new,  subjective,  and 
generalized  performance  measurement  system. 

Quality  of  Ratings 

A  performance  measure  is  successful,  according  to 
Barrett  (1966*12),  only  if  it  meets  three  standards* 

It  must  be  acceDtable  to  the  people  who  use  it;  it 
must  cover  what  is  important  and  only  what  is  important; 
and  a  systematic  examination  of  the  results  of  ratings 
must  show  that  they  are  reasonably  free  from  important 
defects . 


ORGANIZATIONAL  MAINTENANCE 
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AVIONICS  MAINTENANCE 
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Acceptability 


The  performance  data  which  will  eventually  be  use 
develop  performance  effectiveness  models  must  be  aocepte 
by  maintenance  managers  and  evaluators  as  well  as  rssear 
personnel.  The  easiest  way  to  gain  acceptance  might  be 
use  existing  measures  such  as  Airmen  Performance  Pacings 
(APRs)  or  Merit  Ratine’s  for  civilian  personnel.  However 
these  measures  are  used  for  the  administrative  purposes 
promotion  and  wage  administration,  and  not  for  ieveiccm-r 
purposes.  McGregor  (195?'  ar.d  Barrett  ;  1  ?56:  vnrr.  a.ra:  .• 
mixing  such  incompatible  purposes  in  one  program,  as  r.  ■:*. 
nent  is  placed  in  the  incompatible  role  of  judge  ami 
counselor. 

If  a  new  performance  measure  is  to  be  developed, 
might  be  advisable  to  solicit  the  opinions  of  managers  : 
using  surveys  or  limited  acceptance  tests  as  to  cri*.er;'- 
utility.  An  alternative  to  either  using  existing  m  am. r 
or  soliciting  manager  opinions  as  to  acceptac  ili  ty 
to  develop  criterion-referenced  test  measures.  A  tricar 
referenced  test  measures  what  an  individual  can  dc,  ;r 
compared  to  what  he  must  be  able  to  do,  or  must  know,  in 
order  to  complete  a  task  successfully  (Glaser  and  .‘litko, 
1971;  Swezey  and  Pearlstein,  1975).  Such  dricerior- 
Referenced  Job  Task  Performance  Tests  (JTPTJ  were  ex  per i 
tally  developed  by  Foley  (197*0  for  electronic  main tenan 
tasks  after  much  time  and  effort.  Guc.h  objective  tests 
might  trove  to  be  more  acceptable  than  subjective  oerfer 
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judgments,  such  as  supervisors'  ratings. 

Relevance 

Acceptance  is  not  enough;  a  measure  that  omits 
essentials  or  sives  weight  to  trivia  is  defective.  3arrett 
(1966)  faels  that  a  clear  statement  of  the  objectives  cf  the 
ratings  is  the  first  step,  while  Guion  (19o5)  believes  that 
the  first  step  is  a  judgment  of  the  importance  of  the  concep 
ceing  developed.  3oth  authors  agree  that  the  second  step  is 
a  clear  statement  of  wnat  the  job  requires  ana  the  Kinds  cf 
job  behavior  that  are  essential  to  success.  As  Aarrett 
points  cut,  punctual! t”  may  be  important  in  an  automated 
office  where  each  person's  performance  affects  his  neighbors 
but  it  is  unrelated  to  the  success  of  a  docr-to-door 
salesman. 

In  deciding  whether  a  rating  is  relevant,  it  is 
heleful  to  check  it  ae-ainst  standards  described  by  Rrogden 
and  Taylor  (195'3).  The  three  defects  they  identify  are 
deficiency,  contamination,  and  distortion. 

Deficiency .  This  defect  results  if  the  measure  of 
performance  lacks  any  elements  necessary  to  give  adequate 
coverage.  Rating  or  ranking  of  "overall  performance"  gives 
the  illusion  that  everythin^  is  included  while,  ir.  fact, 
raters  may  have  different  concepts  of  job  elements  and 
different  ideas  of  what  constitutes  successful  performance. 
Cummings  and  Schwab  (1973:46)  also  consider  measurement 
deficiency  to  exist  if  employee  productivity  is  accounted 


1  < 

for  by  quantity  of  output  alone  without  also  considering 
quality  of  output. 

Contamination.  Lopez  (1969:211)  feels  that  contam¬ 
ination  occurs  when  behavioral  characteristics  that  are 
unrelated  to  job  performance  are  included  in  an  evaluation 
method.  Such  unrelated  characteristics  include  "self- 
confidence,"  "self-control."  and  "personality." 


Distortion.  When  several  criteria  are  used  to  d 
oerformance  it  is  possible  to  distort  their  importance 
imrroper  weighting.  Criteria  which  are  not  specific  na 
allow  inclusion  of  dramatic  or  easily  observed  events  s 
as  frequent  tardiness  cr  a  lucky  break  in  the  evaluatic 
All  of  these  defects  can  be  avoided  with  careful 
selection  of  the  oerformance  criteria  to  be  evaluated. 
Procedures  for  selecting  such  criteria  w ill  be  discusse 
next. 
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Performance  Criteria 

^he  ideal  topics  for  rating  must  be  both  important 
and  ratable.  As  Barrett  (1966:33)  points  out,  these  two 
attributes  do  not  necessarily  go  together,  as  some  trivial 
areas  such  as  regularity  of  haircuts  may  be  accurately 
rated  while  important  concepts  such  as  output  and  quality 
are  harder  to  pin  down. 

In  general,  Lopez  (1?63:3?)  believes  that  performance 
refers  to  a  soecific  kind  of  human  behavior  in  a  "system" 
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environment  and  activity.  He  feels  that  some  employee 
performance  evaluation  procedures  are  designed  to  judge  only 
behavior,  while  others  are  designed  to  judge  only  results. 

The  first  approach  is  too  general  and  the  second  too  narrow 
because  the  proper  object  of  the  process  is  the  evaluation 
of  the  act  of  performing  in  terms  of  both  results  and 
behavior. 

Guion  (1965s 91 -96)  indicates  that  two  types  of 
criteria  can  be  used.  These  are  objective  measures  of  job 
behavior  and  judgment  ratings.  Objective  or  countable 
measures  of  behavior  can  be  grouped  into  two  major  catego¬ 
ries;  production  data  and  personnel  data.  "Production  data" 
includes  quantity  and  quality  of  output,  while  "personnel 
data"  includes  absence  or  accident  rates. 

Objective  Measures 

Attempts  to  use  objective  data  in  analyzing  mainte¬ 
nance  performance  were  made  by  Sauer,  Potter,  and  Campbell 
(1977),  Foley  (197^),  and  Meister,  r'inley,  and  Thompson 
(1971).  Sauer,  Campbell,  and  Potter  (19?7;22)  attempted  to 
use  individual  task  performance  for  Short  Range  Attack 
Missile  (SHRAM)  technicians  through  the  Strategic  Air 
Command  (SAC)  Maintenance  Standardization  and  Evaluation 
Program  (MSSP).  This  provided  information  on  technician 
performance  against  standards  for  technical  errors,  safety 
errors,  and  reliability  errors.  Technician  tasks,  however, 
are  designed  for  ease  of  completion,  which  results  in 
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very  few  errors  and  limited  variability.  These  performance 
measures  are  thus  of  limited  value  for  computing  the  rela¬ 
tionship  between  human  resource  factors  and  task  perform¬ 
ance. 

Meister,  Finley,  and  Thompson  (1971:31).  utilized 
observers  to  record  the  performance  of  technicians  on  a 
very  specific  electronics  maintenance  task — autopilot 
reoair.  Two  types  of  performance  variables  were  recorded: 
those  which  were  based  on  objective  observation  (  e .  g . , 
elaosed  time,  error  frequency,  number  of  components  removed 
and  replaced',  and  these  wr.ich  were  based  on  the  subjective 
judgment  of  the  observer  and  the  observed  technician 
(e.g.,  efficiency  of  performance,  difficulty  of  task).  The 
drawbacks  of  this  method  include  the  need  to  train  observers 
for  particular  maintenance  functions  and  the  lack  of 
relevance  of  the  measures  for  service  functions  (e.g., 
refueling,  canopy  cleaning,  etc.)  performed  by  Organizational 
Maintenance  Squadron  personnel. 

Foley  (197^0  advocates  the  use  of  Criterion-Refer¬ 
enced  Job  Task  Performance  Tests  (JTPT).  Ror.an  (197c) 
reports  that  a  Task  Performance  Test  for  firemen  led  to  the 
adoption  of  nine  independent  performance  factors,  which  are 
superior  to  peer  and  supervisory  subjective  evaluations. 

Such  systems  are  difficult,  costly,  and  time-consuming  to 
develop,  according  to  Obradovic  (1979).  No  such  rating 
measures  now  exist  for  the  many  maintenance  tasks  performed 
by  Air  Force  technicians. 
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Subjective  Measures 

Subjectives  ratings  or  judgments  are  relied  upon  by 
management  as  criteria  for  validation  studies.  Guicn 
(1965:96)  reports  that  eighty-one  per  cent  of  validation 
studies  appearing  in  the  Journal  of  Applied  Psychology 
and  Personnel  Psychology  between  January,  1950,  and  July, 
1955.  relied  upon  ratings. 

According  to  Barrett  (1966:33).  rating  scales  are 
concerned  with  three  kinds  of  concepts:  personality,  per¬ 
formance,  and  product.  Personality  is  the  total  of  a  person 
characteristics.  It  includes  emotional  make-uo,  intelligence 
and  what  is  commonly  called  character.  Performance  has  to 
do  with  how  an  individual  goes  about  doing  work.  Includes 
are  working  hard,  following  instructions,  planning,  and 
taking  responsibility.  Product  is  a  cerson's  output.  The 
quantity  and  quality  of  work  are  product. 

The  most  pertinent  of  tne  three  is  product.  Manage¬ 
ment  is  fundamentally  interested  in  sales,  production  of 
finished  goods,  and  other  factors  that  are  visible  and 
inherently  measurable.  Product  in  some  cases  can  be  meas¬ 
ured  directly  (objective  measurement)  and  in  other  cases 
it  is  necessary  to  have  a  rater  look  at  the  product  and 
evaluate  its  quality.  Measures  of  product  often  suffer  from 
deficiency,  as  only  part  of  an  individual's  output  car.  be 
measured  in  objective  terms.  They  may  also  be  contaminated, 
since  much  of  what  is  measured  is  beyond  the  individual's 
control;  for  example,  product  may  be  the  output  of  many 
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individuals,  not  one  alone. 

Existing  ratings  of  individuals  employed  by  the  Air 
Force  are  of  little  value  except  for  administrative  purposes 
Airmen  Performance  Ratings  (APRs)  are  inflated,  according  tc 
Callander  (ld79),  and  are  of  little  value  as  a  single  per¬ 
formance  measure.  Civilian  and  military  personnel  appraisal 
are  also  privileged  information  which  are  difficult  to  gain 
access  to. 

If  production  is  not  available  for  evaluation,  the 
rater  may  evaluate  how  the  employee  goes  about  his  work, 
instead  of  what  he  produces.  Though  not  as  objectively 
measured  as  products,  these  job  performance  characteristics 
are  both  ratable  and  important.  Studies  by  Barrett  ( 1 v 6 1 } 
indicate  that  supervisors  and  subordinates  are  quite  sensi¬ 
tive  to  performance,  agree  on  the  relative  importance  of 
performance  traits,  and  attach  a  great  deal  of  weight  to  the 
performance  style  used  on  the  job, 

Kost  nebulous,  but  frequently  rated,  is  personality. 
Employees  are  expected  to  be  trustworthy,  loyal,  helpful, 
friendly,  courteous,  kind,  and  reverent.  However,  no  one 
knows  which  of  these  characteristics  contribute- -and  how 
much  they  contribute--to  job  success.  Indeed,  agreement  on 
definitions  of  traits  is  much  harder  to  reach  than  agree¬ 
ment  on  product  or  performance. 

A  survey  of  fifty  merit  rating  clans  by  Habbe  { 1 ? 5 6 ; 
shows  that  the  element  of  personality,  the  nest  difficult  to 
rate,  was  the  most  widely  used.  The  rating  of  product 
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(using  Barrett's  definition)  was  confined  to  quantity  and 
quality  of  output.  The  findings  are  summarized  in  Table  1. 
Holley,  Feild,  and  Barnett  (1976j9-5o)  reported  similar 
results  on  the  frequency  of  category  use. 

Table  1 

Frequency  of  Rating  Categories  (.Habbe,  1956) 

Category  Freq .  Category  Freq . 

Group  Is  The  Old  Standbys  (Product) 

Quantity  of  work  9.0  Quality  of  work  31 

Group  2:  Job  Knowledge  and  Performance 

Knowledge  of  job  25  Safety  nab  its  7 

Attendance  14  Good  housekeeping  3 

Punctuality  12 


Group  3s  Characteristics  of  the  Individual  (Personal: 


Cooperativeness 

36 

Initiative 

Dependability 

35 

Intelligence 

i  •— 
L  , 

The  major  emphasis  of  ratines  should  be  on  the  prod¬ 
uct  of  an  individual's  effort  in  terms  of  what  he  or  she 
accomplishes.  When  there  are  no  products,  performance  is 
suggested  as  being  the  next  best  level  of  abstraction  to 
deal  with,  while  pure  personality  variables  have  little  if 
any  relevance  to  the  performance  measurement  tas<. 
Hollingworth  (1922»79)  provides  evidence  t.nat  some  traits 
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are  more  reliably  measured  than  others.  Only  personality 
traits  were  studied  and  Table  2  summarizes  the  relative 
disagreement  between  judges  concerning  traits. 

Table  2 

Amount  of  Disagreement  Among  Judges  in  Estimating 
the  Traits  of  Others  (Hollingworth,  19 22) 

Trait  Divergence  Trait  '  '  Divergence 


Close  Agreement 


Efficiency 

33 

Perseverance 

0  3 

Originality 

oo 

*ui  C/tness 

c  '-•* 

Fair  Agreement 

3readth 

96 

Intensity 

9  9 

Leadershio 

96 

Reasonableness 

1  C  0 

Poor  Agreement 


Courage 

1 09 

Integrity  117 

Unselfishness 

110 

Cooperativeness  119 

The  Best  Traits 

In  this  case  it  appears  that  subjective  apcraisals 
are  most  apolicable.  There  are,  however,  many  potential 
traits  that  could  be  used.  Lawler  (1967:371)  indicates  tha 
it  is  easy  to  err  on  the  side  of  providing  too  many  traits 
upon  which  to  make  ratings.  Dun.nette  (1903:252)  points  out 


that  the  use  of  a  single  criterion  is  unrealistic, while 
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hush  i 1 3*3,23)  indicates  that  between  three  and  five  criter¬ 
ion  factors  surface  in  factor-analysis  studies.  The 
potential  size  of  a  study  covering  Air  Force  maintenance 
per^o—- -.nee  mandates  the  use  of  as  few  factors  as  possible* 

Lawler  (1967«371)  indicates  that  one  rating  that 
probably  should  be  included  is  one  on  quality  of  job  perform¬ 
ance.  When  people  are  asked  to  make  such  general  ratings  on 
quality  they  act  in  a  very  predictable  way,  as  efficient 
appraisers  of  critical  incident  data  from  their  observations 
of  an  individual's  performance  in  the  past,  ir.e  other  traits 
besides  quality  that  should  be  used  in  performance  analysis 
are  difficult  to  specify.  They  should  be  based  on  the 
purpose  of  the  study  and  on  particular  types  o'*  behavior 
that  characterize  the  important  functions  of  the  job.  .Viley 
(1978t23)  included  quantity  of  work,  self-initiation,  shar¬ 
ing  of  knowledge,  and  exceeding  one's  scare  as  additional 
rating  dimensions.  In  this  study,  quantity  and  quality  of 
output  are  applicable  to  all  technician  functions  and  are 
of  interest  to  management. 

Aooraisal  Methods 

A  wide  variety  of  appraisal  methods  has  been  developed. 
The  major  appraisal  methods  come  under  four  general  headings: 
(1)  comparative  procedures,  (2)  absolute  standards,  (3)  man¬ 
agement  by  objectives  (M30),  and  (“•!  direct  indexes. 
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Comparative  Procedures 

Comparative  procedures  are  freayuently  characterized 
by  two  features.  First,  the  evaluation  is  made  by  comparing 
one  individual  against  another  on  the  particular  dimension 
of  interest.  Second,  this  comparison  is  often  made  on  a 
general  dimension  which  attempts  to  measure  an  employee's 
overall  contribution  to  the  organization.  Two  popular 
comparative  procedures  are  straight  ranking  and  paired 
comparison. 


Ctrairht  Rankins.  In  an  uooraisa 


a  or  lo 


yoicaliy  ask?.: 


consider  all  of  the  ..iov oc:  v. 


e  appraised  and  identify  tr.e  very  best  performer,  t r| 


econd  best,  and  so  on  through  all  employee 


:ne  very 


poorest.  Cunnings  and  Schwab  (1973*32)  feel  tr.at  tr.Ls 
procedure  is  natural  for  most  evaluators,  as  people  ar? 
frequently  informally  ranked.  3arrett  (l;j66:ip'  indicates 
that  ranging  is  fr«e  of  leniency  and  central  tendency  but 
the  ability  to  show  relative  performance  between  : ec : le  is 
lost.  Sauer.  Campbell,  and  Potter  (I'-w,  uo^d  a  rar./.inr 
procedure  with  a  conversion  to  normalized  percentiles  as 
described  by  Guior,  tc  analyze  maintenance  Ortscr- 


nel  performance.  This  procedure 


O  '  C  3  C?  C.  ' 


that  performance  is  normallv  distributed  over  a 


.u or. 


Paired  Comparisons.  This  system  repair's  t.t  •  evalu¬ 


ator  to  compare  each  employee  tc  be  ranked  with  every 
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employee,  one  at  a  time.  An  employee’s  standing  in  the 
final  ranking  is' determined  by  the  number  of  times  ne  or  sr.e 
is  chosen  over  the  other  employees.  This  system  can  be 
tedious  and  result  in  a  large  number  of  comparisons. 


Absolute  Standards 

With  appraisal  systems  using  absolute  standards, 
individuals  are  evaluated  against  one  or  several  written 
standards.  There  are  two  general  absolute  standards 
methods.  First,  qualitative  metr.ods,  where  trie  evaluator  is 


to  identify  whether  the  aorrai: 


possesses  or  1.0s 


r.o t  possess,  ir.  a  qualitative  sense 


qQra  •  - * 


\J.  .  J. : .  J  ^  - .  a. 


neteristic.  And  secondly,  quantitative  methods,  where 
evaluator  attempts  to  measure  the  decree  to  which  each 
aroraisee  possesses  certain  characteristics. 


•jualitative  Methods.  Critical  incidents  and  forced 
choice  are  illustrative  of  qualitative  methods,  o'  iur.aoar. 

;  1  :*-iO )  describes  the  critical  incident  method  as  a  method 
that  provides  a  picture  of  :r.dividual  performance.  Tne 
rater  records  on  a  special  form  examples  of  outstandingly 
pood  and  poor  performance  on  tne  part  of  the  individual. 

This  method  is  not  useable  in  this  study  as  it  wouid  provide 
nebulous  results  and  be  cumbersome  to  evaluate  witn  mar.v 
maintenance  technicians. 

Forced  choice  procedures  involve  a  series  of  or- o: s 
or  clusters  of  statements  about  fob  behavior.  The  evaluate r 
is  as^ed  to  choose  the  item  which  is  most  iescri- live  of  trv 
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appraisee.  Travers  (1951)  notes  that  forced  choice  makes  it 
difficult  or  impossible  for  a  person  to  control  the  quality 
of  the  rating.  The  descriptive  statements  of  job  behavior 
must  be  developed  for  each  individual  job,  a  procedure  that 
is  also  not  useable  in  this  research  situation. 

Quantitative  Methods.  Conventional  rating  procedures 
and  behaviorally  anchored  rati nr  procedures  are  examples  of 
quantitative  methods.  According  to  Locker  and  Teel 
(1977:2^6),  conventional  ratings  constitute  the  most  popular 
form  of  aporaisal  techniques.  Rating  scales  generally  have 
several  statements  about  employee  characteristics  or  behav¬ 
ior.  A  continuous  or  discrete  scale  is  established  for  each 
item.  Figure  6  illustrates  several  scaling  procedures  from 
Cummings  and  Schwab  (19?3:?0).  Item  A  is  scaled  continuously: 

Figure  6 

Illustrations  of  Conventional  Sating  Scaling  Formats 
for  a  Single  Item  (Cummings  and  Schwab,  I97d; 


Item _ Scaling  Format 


A 

Overall 

job 

oerf ormance 

L*ow 

High 

B 

Overall 

job 

oerf  ormance 

j 

j  i  i 

i 

j  4  1  5 

r» 

Overall 

job 

perf  ormance 

L*ow 

3ei<$w 

Aver.  Abive  hifigi 

Aver . 

Aver. 

the  evaluator  places  a  check  somewhere  on  the  scale  to 
represent  his  assessment  of  the  appraisee.  Item  B  has  a 
numerical  discrete  scale  although  letters  are  sometimes  used 
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instead  of  numbers.  Item  C  is  also  scaled  discreetly  with 
adjectives.  Discrete  scales  generally  result  in  greater 
agreement  amongst  raters  and  hence  are  preferable  to  contin¬ 
uous  scales,  according  to  Cummings  and  Schwab  (1973)* 
However,  the  overall  validity  of  rating  scales  has  been 
questioned.  Bayroff,  Haggerty,  and  Rundquist  (1952:105; 
concluded  as  a  result  of  some  extensive  work  on  Army  ratings 
that  '‘Ratings  using  different  types  of  rating  techniques 
were  not  markedly  different  in  validity.”  Their  comparison 
of  graphic  scales,  forced  choice,  and  a  controlled  checklist 
with  three  criteria  is  shown  in  Table  3*  It  is  significant 

Table  3 


Validity  Coefficients  for  Graphic  Rating  Scales  and 
Forced-Choice  Sections  for  Various  Criteria 
(Haggerty  and  Rundquist,  1952) 


Ratings 

Rank  by 
Associates 

Class 

Standing 

Efficiency 

Retorts 

Graphic  Scale: 
overall 'value 

.53 

.35 

.19 

Graphic  Scale: 
competence  for 
duty  assignment 

.23 

•  *— 

.  1 0 

Forced-choice  pairs 

.M 

.25 

.It 

Controlled  checklist 

.31 

.26 

to  note  that  in  Table  3  ranking  by  associates  is  a  superior 
criterion  when  compared  with  the  validity  of  existing 
performance  measures  such  as  class  standing  or  efficiency 
report  scores.  Furthermore,  overall  value  graphic  scales 
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are  superior  to  any  other  rating  method  investigated. 

An  alternative  quantitative  rating  method  is  the  use 
of  behaviorally-anchored  rating  scales  (BARS).  Millard, 
Luthans,  and  Otteman  (1976)  feel  that  BARS  may  represent  a 
substantial  improvement  over  traditional  rating  approaches. 
Three  basic  steps  are  involved  in  3ARS:  (1)  critical  inci¬ 
dents  are  used  to  determine  job-related  behaviors  and  impor¬ 
tant  performance  dimensions,  (2)  the  job-related  behaviors 
identified  in  the  critical  incidents  are  linked  with  the 
appropriate  oerformance  dimension,  and  (3)  significant 
behavioral  incidents  are  r.umer  ically  scaled  "to  a  level  o f 
performance.  BARS  overcome  two  methodological  problems 
found  in  conventional  ratings:  BARS  identify  trie  critical 
item  included  in  an  assessment  and  scale  these  critical 
items  against  specified  levels  of  performance.  BARS  are  net, 
however,  acrlicable  in  this  study  as  they  require  separate 
scales  for  individual  job  responsibilities . 

Management  by  Objectives 

Management  by  Objectives  (M30)  has  been  offered  by 
McGregor  (I960)  and  others  as  an  alternative  to  conventional 
rating  and  employee  comparison  systems,  rfikstrom  (1966:2) 
feels  that  M30  is  based  on  two  related  concepts:  "(1)  the 
clearer  the  idea  one  has  of  what  it  is  one  is  trying  to 
accomplish,  the  greater  the  chances  of  accomplishing  it;  and 
(2)  progress  can  only  be  measured  in  terms  of  what  one  is 
trying  to  make  progress  toward."  M30  is  primarily  a 
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developmental  crocedure  for  individuals  rather  than  an 
evaluative  one.  A*s  such,  MBO  is  not  applicable  to  this 
case.  t 

Direct  Measures 

All  of  the  procedures  described  to  this  point  require 
that  employee  performance  be  evaluated  or  assessed  by 
someone.  It  is  also  sometimes  possible  to  obtain  informa¬ 
tion  about  performance  more  directly  without  the  necessity  of 
the  performance  behavior  being  filtered  through  the 
evaluative  processes  of  an  aopraiser. 

For  instance,  it  is  sometimes  possible  ~o  measure  the 
productivity  of  an  individual  directly.  These  measures  are 
generally  aimed  at  the  quantity  (e.su,  hourly  units  of 
output,  monthly  gross  sales)  or  quality  (e.g.,  percent  units 
re jected , scrappage )  of  output.  Unfortunately,  no  universal 
quality  or  quantity  measures  exist  for  Air  Force  mainte¬ 
nance  ,  While  quantity  measures  could  be  developed  using 
industrial  engineering  job  standards,  AFM  66-1,  vol.  1, 
(1975:1-7)  mandates  that  standards  be  developed  to  evaluate 
mechanics'  performance  in  only  certain  recurring  tasks. 

These  certain  recurring  tasks  are  those  which  (1)  consume 
a  large  number  of  man-hours,  (2)  involve  extremely  high  cost 
components,  or  (3)  require  a  large  amount  of  equipment  or 
downtime.  This  limited  use  of  standards  thus  makes  quantity 
direct  measures  impossible. 

Quality  control  in  the  Air  Force  is  measured  in  a 
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subjective  manner,  since  many  maintenance  tasks  such  as 
refueling  or  preventative  maintenance  result  in  no  product 
subject  to  a  rejection  or  scrap  rate.  Furthermore,  the 
personnel  evaluations  required  by  AFM  66-1,  vol.  10, 
(1977:1—11)  are  not  completed  for  each  individual  on  any 
regular  basis.  In  addition,  no  sampling  procedures  are 
specified  to  ensure  that  a  representative  sample  of  the 
technician  population  is  evaluated.  As  Sauer,  Campbell,  and 
Potter  (1377)  discovered,  even  the  results  of  Air  Force 
evaluations  are  not  useable  in  a  statistical  analysis  of 
performance  due  to  the  performance  scoring  methods  used  and 
the  resultant  high  level  of  performance. 

Direct  measures,  while  the  least  questionaoie  source 
of  performance  information,  are  simply  not  available  as  a 
useable  source  for  statistical  analysis.  Indeed,  tne 
existing  quality  control  system  makes  it  difficult  to  ensure 
that  a  sample  representative  of  maintenance  technician 
performance  can  be  obtained. 

Suggested  Methods 

Of  the  appraisal  methods  reviewed,  the  only  applicable 
methods  are  straight  ranking  (a  comparative  procedure)  and 
rating  scales  (a  quantitative,  absolute  standard).  Both 
methods  are  based  on  subjective  appraisals  of  perceived 
performance.  The  use  of  either  method  in  appraising  perform¬ 
ance  is  open  to  discussion.  Evidence  indicates  that  the  use 
of  two  rating  procedures  in  conjunction  with  each  other 
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increases  the  accuracy  of  the  final  rating  since  the  rater 
is  forced  to  carefully  consider  each  appraisee  for  tr.e  fir 
ratine:  procedure  before  giving  his  final  rating.  Campbell 
Prien,  and  Brailey  (19o0s440)  concluded  that  "Graphic  seal 
following  a  [performance]  checklist  show  higher  apparent 
validities  than  the  [performance]  checklist  jjalonej." 
Similar  results  have  been  found  for  graphic  scales  follow! 
a  forced  choice  report,  according  to  Barrett  (1966s'7!'  .  It 
is  suggested  that  in  this  study  graphic  scales  follow  a 


forced  straight  ranking  aonraisal. 


should  be  at  o  Lieu 


for  research  and  provide  rating  scale  performance  values 
which  are  normally  distributed  and  acceptable  for  statist' 
analysis . 


Rating  Scale  Errors 


The  use  of  ratings  rests  or.  the  assumption  tr.it  or: 
human  observer  is  a  good  instrument  of  quantitative  cos  erv 
tion,  i.e.,  that  the  observer  is  capable  cf  some  degree  of 
preoision  and  some  degree  of  objectivity.  Several  ebservu 
errors  do  arise  in  rating  scale  use,  however.  These  error 
include  the  error  of  leniency,  the  error  of  central  tender, 
and  the  halo  effect. 


The  Error  of  Leniency 

Often  ratings  tend  to  cluster  about  a  point  at  1 
favorable  end  of  any  scale  used  to  appraise  personnel, 
is  due  to  leniency  on  the  cart  of  aocraisers.  Barrett 
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(1966:23)  observes  that  often,  v/hen  the  descriptive  word 
"average"  is  included  on  a  scale,  more  tnan  half  the  apprais¬ 
ees  are  given  ratings  above  average.  This  is  a  logical 
impossibility  if  these  individuals  are  truly  compared  with 
others  in  the  organization.  In  order  to  reduce  errors  of 
leniency,  Guilford  (1954:273)  suggests  eliminating  the  word 
"average"  from  any  scale.  According  to  3ittner  (19^3), 
additional  ways  to  reduce  errors  of  leniency  include  the  use 
of  ranking  and  the  review  of  ratings  'ey  several  levels  of 
supervisors.  No  published  work  could  be  found  concerning 
the  effect  of  peer  arpraisais  on  leniency. 

Error  of  Central  Tendency 

As  defined  by  Guion  (1965*3°).  this  error  is  marked 
by  restricted  variability  around  the  center  of  the  scale. 
Raters  tend  to  out  their  ratings  in  the  center  of  the  scale 
when  they  are  not  entirely  clear  as  to  the  meaning  of  ratings 
or  when  they  do  not  know  the  person  they  are  rating.  Clear 
definition  of  rating  criteria  and  the  use  of  immediate  super¬ 
visors  reduces  this  problem.  The  use  of  a  few  descriptive 
adjectives  in  the  mi  idle  of  the  scale  also  creates  problems, 
as  appraisal  distributions  tend  to  be  multimodal  and  non¬ 
normal  (see  Figure  n) .  No  published  work  could  be  feund 
which  determined  if  central  tendency  merely  reflects  a 
normal  distribution  of  appraisals  over  a  scale. 

Halo  Effect 


As  defined  by  Guion  (1^65:°°),  halo  is  the  tenier.cy 


Figure  7 


Response  Distributions  Based  on  Scale  anchors 


Distribution 


Multimodal  Distribution 


to  rate  an  individual  in  the  same  manner  on  all  trails 
because  of  a  general,  overall  impression  that  can  be  either 
favorable  or  unfavorable.  Halo  thus  results  in  positive 
correlation  between  the  traits  that  are  rated.  Halo  may  be 
reduced  by  using  a  format  proposed  by  Stevens  and  .Vcr.derlic 
(193^)  that  calls  for  rating  all  appraisees  on  one  trait, 
then  rating  them  on  the  next  trait,  and  so  on.  Duilford 
(105^:27°)  also  indicates  that  one  trait  per  page  should  be 
used.  Ranking  methods,  of  course,  eliminate  the  halo 
effect. 

In  general,  the  above  errors  can  be  avoided  by  using 
clear  definitions  of  traits,  by  concentrating  on  a  single 
trait  at  a  time,  and  by  avoiding  limited  descriptive  adjec¬ 
tives  and  words  such  as  "average.”  It  is  not  known  if  peer 
review  deflates  ratings  (i.e.,  reduces  leniency).  It  is 
also  not  known  if  central  tendency  errors  simply  reflect 
normal  distributions  of  appraisal  ratings. 


Scale  Format 
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Once  it  is  decided  what  should  be  rated  and  by  what 
means,  then  this  information  has  to  be  communicated  to  the 
rater  or  raters  so  that  they  know  what  to  do.  This  problem 
of  communication  is  critical  to  the  success  of  any  ratine 
scheme.  All  raters  should  rate  using  the  same  criteria  for 
the  same  purpose  to  produce  useable  results  that  reflect  the 
performance  of  individuals  in  the  organization.  Although 
this  ideal  can  never  be  met  when  subjective  ratings  arc  ; •  i , 
several  considerations  related  to  scale  buiidir"  oar.  im-revo 
ratings.  Among  these-  considerations  are  rules  for  writing 
scales,  rating  standards,  scale  anchors,  and  ere  f  -rr  ::  ct  • 
rating  scales.  All  of  these  will  be  considered. 

Pules  for  v/ritir.g  Scales 

Several  authors  have  crovided  rules  for  writing  scales 
Uhrbrock  (i°6l)  provides  a  useful  list  of  two  thousand 
scaled  items.  Some  cf  the  most  imrertant  precects  ar 3  ac 
follows : 

1.  -x cress  one,  and  only  one,  thought  in  i  scale. 

2.  Use  words  the  rater  understands . 

3.  Have  the  raters  rate  what  they  observe ,  not  what 
they  infer. 

1 .  eliminate  double  negatives. 

5.  -xpress  thoughts  simniy  an:  clearly. 

6.  Keep  statements  internally  consistent. 

7.  nvoii  universal  terms  suo.r  is  all,  always,  . 

".  Stick  tc  the  rresent. 

o,  avoid  varue  concerts. 

dating  Standards 

The  rate1"  who  nas  d°-’n  it'  onmei  a:°:  jet'1!  ■  .. 
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he  is  to  rate  is  still  not  equipped  to  do  an  evaluation;  he 
must  know  the  standards  against  which  t.ne  rating  '.s  to  be 
made.  These  standards  are  based  on  the  previously  discussed 
consideration  of  the  tyoes  of  ratings,  the  purpose  of  t.ne 
ratings,  and  the  organizational  setting  in  which  tney  are 
arrived  at.  To  aid  in  providing  a  framework  for  clear.y 
expressing  one  trough t  in  a  scale,  Barrett  (1966:77)  pro¬ 
vides  three  related  standards  against  which  performance  is 
generally  rated:  concarison  with  others,  comparison  with  *ob 
standards,  and  concarison  with  absolute  standards.  Job  and 
absolute  standards  do  not  exist  for  ala  tasks  performed  by 
Air  lor  v?  maintenance  technicians,  .is  a  consequence ,  tr.e 
cnl,?  3. v°. \  1 1 o  1  ?  s  -“i v%'!i ur'd  is  con ~03 v Lso v/  Loh  d .  In  or.is 
case  the  aooraisee  is  evaluated  in  relation  to  ot.oer  people 
ir.  some  srecified  group.  Although  any  group  may  be  speci¬ 
fied ,  the  most  pertinent  is  made  up  of  workers  or  the  sam.e 
,iob  or  on  a  similar  one,  ouch  comparison  is  made  most  directly 
when  the  rater  is  asked  to  rank  a  group  of  employees. 


Scale  Anchors 


Scale  anchors  are  numbers,  words, 
tell  the  rater  the  significance  of  making 
given  coi.nt.  Tavlor  e_t  al .  (1058)  found 
cor oorat 5 n.g  behav5  oral  descriptions  of  sc 
superior  in  reliability  to  numerically  an 
However,  a  criticism  commonly  leveled  at 
descrictions  such  as  "excellent,"  "hirnly 
and  "ooor”  is  that  tne  words  io  o:t  have 


cr  phrases  used  to 
his  rating  at  a 
that  formats  ir.- 
a  1  e  s  1 0  ■  s  'a  ere 
o.oored  scales, 
tr.e  use  of  ber.avioral 
favorable,"  "fair," 
a  comm c n  m e a r.  i n g . 


Careful  work  by  Jones  and  Thurstone  (1 765)  contradicted  this 
criticisn  and  supported  Taylor's  findings.  They  also  dis¬ 
covered  that  scales  in  which  the  end  points  are  wider  apart 
srive  more  reliable  results  than  do  toose  in  which  the  s tread 
is  constricted.  Vhen  only  end  anchors  are  riven,  there  is 
less  error  at  the  extremes  than  at  the  central  value. 


For  the  present  study,  a  quality  scale  could  be  an¬ 
chored  with  the  adjectives  "lowest"  and  "nirhest,"  while 


Considerable  attention  has  b  or  r.ai  :  :r  ex :  er iron tal 


rsychol ony  tc  the  modems  of  scalir.r  to  find  out  all  *.oat 
can  be  'earned  about  man  as  a  measuring  instrument .  -:<  ter- 

ier.ce  has  shown  that  certain  rules  are  favorable  t;  effect;-/ 
rraohic  ratings.  Juilford  ( l^S^ : 26? )  lists  the  follcwir.r 

rules : 

1.  bach  trait  should  occupy  a  tare  bv  itself. 

2.  The  line  should  be  at  least  five  inches  loan,  out 

Hot  T.U C fi  1.  n cT S T*  • 

3.  The  lire  should  have  no  brea.-ts  or  divisions. 


U. 

Th6  M2*ood"  or  " liiprr. 

"  ends  of  toe  lines 

should  /.  e 

in  toe  same  1; recti 

cn. 

5. 

For  unsophisticated 

raters,  the  "^-cod" 

end  mould 

be  claced  first. 

6. 

Descriptive  nhrases 

or  cues  should  be 

c  oncer,  to  ;  *  -i 

a3  much  as  possiole 

at  to  Lots. 

7. 

-nd  cues  should  not 

be  so  extreme  in  n 

ear. inn  tr.at 

they  will  never  be  applied. 

3.  2nd  cues  should  be  set  at  a  little  distance  from 
the  ends  of  the  line. 

Q.  In  scoring,  a  stencil  should  be  used  tnat  divides 
each  line  into  sections  to  which  numerical  values 
are  assigned. 

The  number  of  steps  in  a  scale  varies.  Bendim  (195^) 
reoorts  on  experiments  in  ratings  in  which  he  found  tnat 
satisfactorily  high  reliabilities  were  obtained  on  scales 
involving  three  to  nine  levels.  The  Air  Force  currently  uses 
ten  levels  for  Airmen  Performance  Batinas,  a  form  that  all 
military  technicians  are  familiar  with.  Ire  use  of  up  to 
ter.  levels  in  this  case  is  supported  by  uarrett  •vl.;'co:3?;  , 

•yn  ",  -  V'  £  -  raters  can  ma.-te  finer  distin  cticr.s  ■<  a  e  r.  the 

scale  calls  *'er  iubmir.m  the  differences  between  two  people 
than  in  ratir.r  a  -erscn  a.-air.st  a  standard. 

Barrett  ( 1  'do  s  oh )  also  f  eels  that  rnucn  discussion  out 
little  research  has  centered  on  tne  problem  cf  an  old  :r  ev-r. 
number  of  seal3  steps.  The  ever-numbered  scales  deny  the 
rater  t^e  use  of  the  term  "average"  as  a  rutinr,  the  easiest 


ra tins  to  ma he .  Od d -nun: 


.les ,  or.  the  otr.er  hand, 


allow  averare  ratings,  as  there  snould  be  more  average  people 
than  any  ether  x ! nd .  There  is  no  conclusive  evidence  with 
which  to  resolve  the  issue;  the  presence  :r  absence  of  a 
central  point  when  more  than  five  levels  are  used  procabiy 
does  .not  mare  mu  on  difference. 

It  thus  ac-ears  that  the  best  scale  format  for  this 
study  should  follow  the  rules  listed  by  "hrbroex  and  Guil¬ 
ford.  The  rat inn  standards  should  be  based  on  comparisons 


with  otter  technicians  within  a  particular  maintenance 


souadron.  Vsir.r  two  adjectives  to  anchor  the  ends  -f  s:  ..-.’s 
for  duality  and  quantity  of  oerformance  a:,  raisal  should  ser*. 
several  ourroses:  (1)  the  tern  "average"  vault  be  avoided , 
(2)  Generality  of  the  scale  would  be  maintained  to  ma.-.e  it 
aorlicable  to  many  maintenance  activities,  (3;  tva  oossibiii ' 
of  obtaining  a  normal  performance  distribution  ••.•cult  be  im¬ 
proved,  and  (4)  multimodal  distributions  grouped  around  des- 


criotive  adjectives  would  be  avoid: 


. e r.  szezs 


Dgr*  ♦*  3  ^  ”r  3  ^  ^  o  ^  'i  ^  ^  »•**  ^  c*  '  ^  *■  .a  v»  *•  \  p  v* « j  ^ 


is  beirm  rated,  should  allow  f  ;r  f  i: 


o  e  tw  °  en  t  e  c  h  n  -  c  i  a  r.  s  • 


The  .later 


There  are  five  ocssible  cartie 


oraisinm:  (1)  the  supervise! 


O  ^  ,u>  *“»  O 


to  0  0 


(2  •  orranizaV  onai  peers  of  the  astrals  .-e,  v  ?  ■  the  a  ..  rale 
himself,  {'■*)  subordinates  of  the  ao  raisee,  and  -  :  ere  on- 

outside  the  immediate  work  environment  of  try  a-:  raisce, 
of  these  carties  mimht  be  arpropriate,  depending  on  one  pur 
cose  (either  evaluative  or  develc  omental  1  of  toe  an  ..raisal 
and  the  dimensions  (either  cutc-mes  or  methods'  oeinr  ao- 
oraised ,  This  study  is  primaril  •  concerned  wit  .a  evi.uativ- 
our  cose  based  on  outcomes. 


ouoe^visory  a^rraisal 

There  are  two  crimary  Justifications  for  center1 


accraisa!  orocess 


performer's  su  or: or. 


V 


of  formal  authority  which  exists  in  most  orranizat .  ;r.s 
legitimizes  tie  ri^ht  of  the  suoerior  to  -a-.e  evaluative  ar.d 
developmental  decisions  concerning  his  suoord i.nates .  .  ir.er 

(1963)  and  Vanzelst  and  Kerr  (1353)  nave  so. own  tn.it  tr.e  super 
visor  is  the  person  most  employees  want  an  i  rooaolv  extcCt 
to  atomise  them.  Thornton  (l°o3)  has  sr.own  that  supervisor 
ratines  are  valid,  while  3arrett  (19ppj  nas  sn.cwn  tnem  to  be 
reliable  if  care  is  taken  to  train  a-;  raisers  ar.d  to  use  e  r. 


acceptable  aor-isai  form. 


supervisors  : :  .nr 


maintenance  t?  chn.i  c :  an.s  are  us-’d  as  raters,  a  "vever,  i  train- 


"1  i  **  r :  o'i  It  "i  >-i  i  sxt^p.s  i  vo  t>9  t  2  2  r.  j,  1  s  ‘  ;  l  £  r  v  1  .*  i  ?  p  c ! 


•  v*  a?  ^  n  •*V'  .av'^  ^  *  6  ^  y«  o  ’i  ’’  ->  r*  '  ^  1.  ^  1  V  '  1  «  '  ' 


•  v  *;  *j  1  i  s*  *;  r  v :?  t  ;  e r. 2. p. *?  T r. 3  v;  a  1  i  t  v  0  r '1 t,  i  n  " s  i.  .  1  T  >  ~a  e 

t**i  ’  ^  in  ■•'■r-p^acjhn-'  ^  a  ^  ^  Icp  viri  a  * %  a  *■'  1  -  ■■  ^ 

••'hit  la,  Hear.,  an.  i  Tirrell  (l^f"  ini  irate  t-.a  t  those  rat  ers 

on  the  supervisor'/  level  functionally  closest  tc  toe  r  ate-'s 

are  best  able  to  rate  them.  Supervisor  ratir.rs  arc  thus 
the  most  applicable  ratine  scr.eme  availaole  for  ratinr  .~iir 
Force  maintenance  personnel  performance. 


According  to  Lew  in  ar.d  Zwany  (lQ^o),  peer  rat:r.cs  hav- 
been  empirically  shown  to  have  nieh  validity  in  the  iredioti 


0^  diverse  future  performance  criteria. 


.■/ever,  current-. 


not  future--oerformar.ee  is  of  interest  in  t: 


themore,  peer  appraisals  have  seldom  peer  used  :  or  evaiuatic 


this  study.  Immediate  supervisors  are  knowledgeable  o.buu 
their  personnel  and  the  desired  performance  in  a  carticul: 
maintenance  specialty. 

Recommendations  and  Conclusions 

One  of  the  greatest  needs  of  managers  of  the  milit: 
weapons  system  maintenance  complex  is  to  measure  accurate: 
how  well  individuals  perform  on  the  job.  Individual  job 
performance  forms  one  of  the  bases  for  performance  by  t” e 
entire  maintenance  organization.  An;:  if  t no  -f  :'eot  i  v -n 

r  "  p  £  ">  S  ZC  •  w-j  y»  -3  ^  a  >~  *  O  *  *  V  *  ^  A  *'•  1  »-•  ;r.  \ 

t'  i  c  ^  ]_  cm  n  ^  ;a  ^  •  -p.p  ^  . o  -p  T <  T  '  a  ^  ~  .  -  . 

inn  individual  performance  can  be  iif a v  . 
Ccnclus  i  or.s 

T'n.o  p*t 03. t s s croc i  J'v'.  ire  rv2 -c. cu r L r.rr  ir.ciiv1  i'.i'cl  3~ric:‘ 
ance  is  t^at  existing  ra t Lmr  jcicer.es  ar?  a":  a;;  ~1;. 

to  statistical  analysis  or  are  hichiv  inf.itei  and  ;n. 
for  research.  Airmen  Performance  Ratings  and  .lend'. 
are  used  for  administrative  purposes  of  promt  tier,  and  do.-, 
and  seldom  reflect  job  performance  alone.  ?r ~  fit!  .\n.;v  no* 


are  based  either  on  caoer  and  pencil 


tr.eorv 


not  reflect  job  performance  or  on  infreouer.t  'be  *r-  •.  -  - 
■  :’3EF ;  which  are  hi.rhiy  skewed  and  unuse  at .  2  f:n  s  mat : 
analysis  ar.i  which  ref Lect  compliance,  not  •;er;hr-.:r.o- 
bility.  Since  existing  data  is  not  arrl icao ie ,  a  new 
scheme  must  be  developed. 

Any  new  performance  rating  soneme  must:  V1  o 


cable  to  all  levels  of 


the  maintenance  organization 
his-h  quality,  useful  criteria  measures,  (3'  have  va 
sures,  (4)  be  free  of  error,  (5)  have  an  accurate  f 
(6)  be  used  by  an  appropriate  rater.  In  t.ne  first 
the  size  of  the  Air  Force  maintenance  organization 
a  measurement  scheme  applicable  to  civilian  and  mil 
technicians  of  all  races  and  sexes  performing  many 
inn  from  servicing  aircraft  to  repairing  missile  ru 
systems,  i’ost  maintenance  of  any  t-'pe  is  performed 
of  five  to  ter.  technicians  supervised  oy  one  ir.divi 


thus  appears  that  tne  orramzat: 
performance  measures  to  reneral 
quality  a*'  performance  based  on 
immediate  supervisors . 


'  ■  •-  $  * 


suo. 


Secondly,  any  new  performance  measure  must  o 
quality,  i.e.,  it  must  have  face  validity.  inis  me 
it  must  be  acceptable  to  the  people  who  use  it,  as 
being-  relevant  for  management ,  Acceptability  can  b 
mined  by  usi.n?  a  measure  similar  to  exist  ins’  meacur 
surveying  maintenance  management  personnel  concern! 
formance  by  technicians.  Aelevance  can  be  aenieved 
ins:  deficiency,  contamination,  ar.d  distortion;  avoi 
these  problems  ieper.ds ,  to  s  :me  extent,  on  wr.at  is 
how  it  is  rated,  ar.d  by  whom. 

fhirdly,  the  performance  ratin'  must  nave  va 
sures.  Objective  ratinr  criteria  are  sin  -iy  not  av 
not  useful  to  this  study.  foerefore  suo. motive  tra 


criteria  are  the  only  ones  beine  considered.  Of  such  sub¬ 
jective  traits,  performance  or  product  criteria  appear  to  be 
the  most  valid  according  to  existing  literature  or.  the  subject 
And  of  those  categories  available,  individual  quality  and 
quantity  of  performance  by  technicians  are  the  most  appli¬ 
cable.  These  concepts  are  easy  to  compare  in  personnel,  pro¬ 
vide  better  agreement  between  raters,  and  provide  information 
useful  to  management.  In  particular,  quantity  and  quality 
of  performance  are  easy  tc  relate  to  individual  motivation 
and  capability  in  ar.  overall  cerformar.ce  model. 

fourth,  the  oerfcrmance  rati nr  must  os  as  free  of  error 
as  possible.  Of  the  a  coral  sal  methods  reviewed,  me  only 
applicable  methods  for  this  study  are  straight  ran  kins:  (a 
comparative  procedure)  and  rat  in;?  scales  (a  quantitative ,  ab¬ 
solute  standard).  3oth  methods  are  based  on  subjective  apprai 
sal  of  -erceived  performance .  evidence  indicates  that  the 
us e  of  graphic  scales  following  a  forced  ranking  increases 


the  accuracy  of  the  retinas.  This  method  should  be  a.,  olicabi 


for  research  in  the  present  study  ano  should  provide  ratine 
scale  rerfornance  values  which  are  normally  ilstriouted  and 


acceptable  for  statistical  analysis. 

Fifth,  the  ratine  scale  must  have  an  accurate  format. 
The  actual  ratine  scale  proposed  by  tnis  study  is  designed 


to  minimize  errors  of  leniency,  of  central  tendency,  and  of 
the  halo  effect.  The  quality  and  quantity  of  performance 
appraisal  forms  proposed  (Appendices  3  and  C)  are  aiaoted 


from  oauer 


Campbell,  and  Potter  (1977). 


The  directions  have 
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been  edited  to  conform  to  Uhrbrock's  (1961)  rules  for  appraisa 
forms,  i.e.,  thoughts  are  expressed  clearly  and  simoly,  state¬ 
ments  are  internally  consistent,  and  words  that  the  raters 
understand  are  used.  To  conform  with  Guilford's  rules  (1954), 
each  trait  occupies  a  page  by  itself,  the  scale  line  is  five 
inches  long,  and  ten  steps  corresponding  to  increasing  per¬ 
formance  compared  to  other  technicians  in  a  squadron  are  used 
to  provide  an  adequate  spread  of  responses. 

Finally,  the  performance  rating  must  be  used  by  an 
aporopriate  rater.  For  this  study  the  use  of  immediate  super¬ 
visors  as  performance  raters  is  the  most  appropriate  technique 
Personal  supervision  of  supervisors,  either  in  a  group  or  on 
an  individual  basis,  serves  to  enhance  the  quality  of  ratings 
and  to  make  training  unnecessary.  It  might  also  be  applicable 
to  have  peers  complete  acpraisals  in  an  attempt  to  reduce  rate, 
leniency . 

In  short,  the  suggested  rating  forms  in  Appendices  3 
and  C  are  the  best  that  can  be  developed  based  on  a  review 
of  the  literature  concerning  appraisals  and  on  the  nature  of 
the  Air  Force  maintenance  organization.  These  rating  forms 
have  face  validity,  if  previous  research  conclusions  are  accep 
ted.  It  remains  to  be  seen,  however,  if  the  suggested  rating 
forms  actually  do  prove  useful  to  maintenance  management  and 
do  orove  to  be  statistically  valid  as  a  measurement  of  per¬ 
formance.  In  any  case,  the  suggested  rating  forms  should 
provide  useful  maintenance  personnel  performance  data  for  use 
in  developing  a  model  which  accurately  explains  the  contri- 


bution  of  individual  motivation  and  ability  to  Air  Force 
maintenance. 


Chapter  3 


METHODOLOGY 

This  chapter  deals  with  the  procedure  used  to  test 
the  recommended  performance  measures  of  quantity  and  quality. 
In  addition  to  the  test  utilizing  maintenance  personnel  at 
Williams  AFB,  Arizona,  a  small,  independent  survey  was 
distributed  to  determine  the  opinions  of  several  maintenance 
managers  at  other  bases  on  the  usefulness  of  the  performance 
measures . 

The  following  topics  will  be  considered  in  Chapter  3; 

1.  The  data  gathering  instruments . 

2.  The  sample  selection  procedures. 

3.  The  specific  data  collection  procedures. 

4.  The  plans  for  analysis  of  the  data. 

A  discussion  of  assumptions  and  limitations  of  the 
methodology  will  be  included  in  the  chapter  summary. 

The  Data  Gathering  Instruments 

The  appraisal  forms  developed  in  Chapter  2  (Appendices 
3  and  C)  are  the  primary  data  sources  for  this  study.  Based 
on  the  conclusions  of  the  previous  chapter,  these  forms  were 
developed  with  the  intention  of  providing  useful  performance 
information  for  statistical  analysis. 

In  addition  to  the  appraisal  forms,  a  limited  number 
of  maintenance  officer  opinions  were  solicited  concerning 


was  felt 


the  validity  of  the  proposed  appraisal  methods.  It 
that  a  general  survey  of  maintenance  officers  would  nave 
been  overly  time-consuming  and  costiy.  The  overall  results 
of  such  a  survey  would  have  revealed  a  consensus  opinion  of 
average  officers,  whereas  the  opinions  of  five  or  six 
officers  who  have  excellent  performance  records  may  be 
considered  more  relevant.  An  example  of  the  officer  opinion 
questionnaire  and  its  cover  letter  are  contained  in  Appendix 
K.  For  such  a  small  survey,  free  form  answers  were  solicits 
rather  than  multi  ole-choice  or  two-way  answers;  this  format 


was  suggested  by  Neter  and  , Vasserman  { 1  "-7~xc?) .  Th 
question  on  the  survey  and  the  cover  letter  were  do 
establish  rapport  with  the  respondent.  All  cues tic 
simple  and  were  designed  to  be  clear,  to  avoid  lead 
respondent,  and  to  eliminate  bias.  The  questions  i 
the  following: 


.v  e  r 


1.  Is  individual  performance  important  to  the 
maintenance  organization? 

2.  Are  the  ranking  forms  appropriate  for  appraising 
performance  or  can  you  suggest  a  better  approac.u? 

3.  Are  quantity  and  quality  useful  measures  of 
per f  ormance? 

A.  Which  do  you  consider  to  be  mere  imr.ortant, 
quantity  or  qualify? 

5.  If  one  is  more  important  than  the  other,  can  you 
indicate  how  much  more  important  it  is? 

The  responses  to  this  survey  questionnaire  were  intended  tc 
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indicate  whether  the  proposed  performance  rating  forms  have 
face  validity  and  how  quantity  and  quality  racings  might  be 


combined  into  a  single  measure  of  performance. 

Several  existing  sources  could  theoretically  provide 
data  to  validate  the  accuracy  of  the  performance  appraisals. 
Among  these  are  Airmen  Ferformar.ee  Ratings  (APRs)  and  Skill 
Knowledge  Tests  (SKTs).  However,  both  the  AFRs  and  $KT 
results  are  Air  Force  privileged  information  and  were  not 
available  to  this  author  as  data  sources.  Further,  AFRs  have 
a  historv  of  being  highly  inflated,  according  tc  Callander 
;  i  c 7 o j A,. )  ,  while  SKTs  are  paper  and  pencil  tests  administered 
only  to  test  selected  skills  and  at  uneven  intervals.  Tr.us, 
neither  would  in  actuality  be  an  appropriate  source  for 
validating  the  performance  ratings.  In  fact,  there  are  .no 
available  Air  Force  records  that  would  be  useful  for 
validation  cf  speed  of  performance — or  quantity — rating 
data . 


The  source  used  in  this  research  for  .a  possible 
validation  of  at  least  the  quality  of  performance  rating 
scale  may  be  referred  to  as  maintenance  quality  control  («3) 
information  on  technician  inspections  under  the  Air  Force 
Maintenance  Standardization  and  Evaluation  Program  UMSKF). 
;.’3EP  personnel  oerformance  scores  are  based  on  separate 
failure  levels  cr  baselines  for  each  type  of  maintenance 
task.  Although  they  rrovide  the  best  existing  performance 
data  for  validation  use,  difficulties  may  oe  encountered  in 
using  y.SEP.  For  instance,  personnel  evaluations  aru  not 
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required  nor  completed  for  any  one  individual  on  any  regular 
basis;  thus  some  of  the  technicians  in  any  random  sample  of 
technicians  may  not  have  KSEP  records. 


Summary  of  Data  Needs 

The  data  to  be  gathered  for  this  research  thus 
includes  (1)  supervisor  appraisals  of  technicians  reporting 
to  them,  using  the  recommended  rating  forms  (Appendices  3 
and  C),  (2)  surveys  of  a  few  selected  maintenance  officers 

( Appendix  K),  and  (3)  "SEP  reports  available  for  most  of  tr.e 
technicians  drawn  in  the  samcla.  The  sample  instruments 
for  (1:  above  were  reviewed  and  approved  by  the  Air  Force 
Military  Personnel  Center,  Randolch  AP3,  Texas.  Human 
subject  clearances  and  a  privacy  statement  example  are 
included  in  Aooendix  A. 


Santis  Selection  Procedures 

Selection  of  the  maintenance  technicians  for  the 
samcle  proved  to  be  difficult.  It  was  desired  that  only  line 
technicians  be  evaluated,  and  that  they  oe  evaluated  by  their 
immediate  supervisors,  who  are  responsible  for  scheduling  and 
insoecting  assigned  tasks.  No  existing  source  document  used 
by  the  Air  Force  apcears  to  reflect  this  information  on  an 
accurate  and  current  basis.  A  complete  listing  of  all 
maintenance  line  suoervisors  and  their  immediate  subordinates 
is  necessary  if  a  randomly  drawn  sample  of  shift  supervisors 
and  their  subordinates  is  to  accurately  reflect  t.oe  entire 


a 


maintenance  organization  of  a  chosen  Air  Force  3ase. 

The  existing  personnel  listings  of  racing  officials 
responsible  for  preparation  of  annual  AFRs  does  not  refiec 
current  work  group  assignments.  For  instance,  in  exaninir. 
the  personnel  listing  at  .Villiams  AFE ,  Arizona,  it  wa  ; 
found  that  one  supervisor  was  shown  with  six  subordinates, 
none  of  whom  currently  reported  to  him.  Another  su  erviso 
on  this  same  personnel  listing  had  three  assigned  subordi¬ 
nates  listed,  one  of  whom  was  not  currently  assigned  to  hi" 
while  he  actually  supervised  an  aciitior.al  sivv.n  w no  :i  3  r. 

•i  on  ^ !^i ~  X/ ~  •*  *~  i  .z *,* :  ^  *  *  - 

r  .3 f  r* 3. i. p 3 5* C i  w i  3. 1. s  s  ’j  '3  0  3 1  ^ 3. 3  , ; *  3  '*  ^  3  ft  _ ■  >- 3 

ir.  selecting  the  sample. 

Another  data  base,  the  Maintenance  Manas ---ment  lr.f  r 
nation  and  Control  System  (IC'.ICS)  ,  maintains  a  :il„  of  a.’. 

3  v»  ?  n  rt  u  |  ■*"  *  Ve  p  ^3.^  3  G  r^’.i  ^  C  '3  J*  r*  3  3  ’  3  ft  *~r  '  '  *3  ^  *  *■  •-* 

file  does  net  identify  supervisors  or  their  inn*  iiate  sue. 
dinates.  As  a  consequence,  it  was  necessary  to  ob*ain  a 
current  roster  from  each  maintenance  section  prior  to  draw 
ir.g  the  technician  samolo  for  the  study. 

In  this  study  supervisory  groups  wore  randomly  dru.v. 

y*  /“1  m  m  1  i  n  *  '  *1  "  /■*>  ■f*  1  1  rt  1  >  »>  ^  .a 
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chosen  bv  dice  rolls  to  doc.de  the  particular  shifts  to  be 
included  in  the  sample.  At  least  t.nree  and  no  more  than  f 
subordinates  were  selected  to  be  evaluated  by  t*aoh  superv: 
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Where  more  than  five  subordinates  reported  to  one  supervisor, 
five  subordinates  were  selected  for  the  study  oy  again 
using  random  number  tables.  Alternates  were  also  selected 
by  this  method  where  more  than  five  technicians  were 
encountered  in  a  group.  This  stratified  selection  method 
allowed  the  researchers  to  control  sample  participation,  to 
eliminate  supervisor  bias,  to  obtain  a  representative 
sample,  and  to  allow  supervisors  to  evaluate  enough  subor¬ 
dinates  (five  each)  to  obtain  valid  comparisons. 

Sample  size  was  set  at  ninety  technicians  per 
sauadron  (approximately  twenty  percent  of  the  population) 
based  on  the  sample  sizes  used  in  similar  studies.  eighteen 
corresponding  supervisors  per  squadron  completed  technician 
evaluations.  As  the  maintenance  organization  at  Williams 
AF3  is  made  up  of  two  squadrons,  the  total  sample  drawn 
consisted  of  169  technicians  (some  supervisory  groups  hai 
less  than  five  technicians)  and  thirty-six  supervisors.  Ho 
research  dealing  with  Air  Force  maintenance  technician 
performance  has  reported  useful  information  on  the  effect 
of  sample  size  on  statistical  tests.  This  research  should 
provide  information  on  adequate  sample  sizes  for  minimizing 
the  probability  of  erroneously  accepting  a  hypothesis  (Type 
II  error)  for  certain  statistical  tests. 

As  for  the  maintenance  officers  selected  to  answer 
the  independent  survey  questionnaire,  they  are  personal 
acquaintances  of  the  author  and  have  all  managed  maintenance 
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operations  graded  as  excellent  or  better  by  Major  Command 
inspectors.  The  majority  are  now  retired  and  should  thus 
have  felt  no  restrictions  in  supplying  candid  answers. 

As  has  been  noted,  sample  selection  for  this  study 
was  very  time-consuming.  To  begin  with,  much  time  was  spent 
in  evaluating  existing  rosters  of  personnel  before  it  was 
determined  that  the  rosters  were  inadequate  for  drawing 
the  names  of  maintenance  personnel  and  their  immediate  shift 
supervisors.  The  sample  was  randomly  drawn  from  all 
suoervisory  groups  with  more  than  three  technicians  resort¬ 
ing  to  a  shift  supervisor.  Details  of  the  enact  data 
collection  process  follow. 


Data  Collection 

The  sample  data  was  collected  at  Williams  AF3, 

Arizona.  To  protect  the  privileged  nature  of  the  supervisor 
performance  appraisals  and  Privacy  Act  requirements,  control 
numbers  were  assigned  to  the  technician  participants  in  the 
study  and  the  researchers  supervised  all  appraisal  and 
evaluation  sessions.  Ail  participants  completed  the  survey 
forms  in  a  central  location  and  during  specified  time  periods 
which  allowed  for  participation  by  personnel  from  all  three 
shifts . 

The  independent  officer  survey  questionnaires  were 
mailed  to  eight  maintenance  officers  and  responses  were 
received  from  four,  as  the  remainder  had  moved  and  left  no 
forwarding  addresses. 
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In  Chapter  2  it  was  theorized  that  peer  appraisals 
might  deflate  ratings;  it  was  also  noted  that  no  rc-sear  cr. 
had  been  done  in  this  area.  No  seer  appraisals  were  attempt 
ed  in  this  study  either,  due  to  time  constraints,  to  me 
difficulty  of  providing  the  necessary  information  to  all 
169  oarticioants ,  and  to  the  wide  range  of  e/.perience  levels 
found  among  any  five  technicians  within  a  supervisory  group. 
The  sample  size  simcly  made  oeer  appraisals  too  difficult  to 
administer,  which  may  also  be  the  reason  why  this  method  hsd 
not  been  used  in  the  past. 


Data  collection  was  s oread  over  a  period  of  two-ar. 
one -ha If  weeks.  Unfortunately ,  during  the  survey  period 
several  technicians  changed  shifts  and  supervisors.  The 
collection  time  period  also  allowed  for  changes  in  wort 
requirements .  These  difficulties  were  relatival'.’  mir  .  r, 
however,  and  the  data  collection  methods  were  successful. 


Data  Analysis 

In  analyzing  the  data  obtained  from  the  .'/iliiams  .t.'l 
sanole,  the  first  consideration  is  to  determine  if  the 
organization  is  adequately  represented  in  the  sample.  In 
this  case,  the  maintenance  organization  is  small  enough  tc 
compare  the  sample  population  and  the  base  maintenance 
regulation  with  respect  to  several  characteristics . 

Statistical  analysis  of  the  data  should  then  estab¬ 
lish  if  it  is  suitable  for  use  in  a  regression  analysis  and 
if  there  are  significant  differences  between  quantity  and 


quality  ratings.  Next,  comparisons  will  also  d«ternin:; 
quantity  is  related  to  quality  cased  or.  tne  suoervis  :rs ' 
subjective  appraisals  of  the  technicians.  Finally,  an 
attempt  to  validate  the  quality  rati r.r  scale  v.’iil  use 
correlation  analysis  to  find  if  any  linear  relation  exists 
between  v IFF  data  and  the  performance  quality  data  of  onto 
study . 

Descriptive  statistics  will  also  be  -perorated  :.r 
the  Biomedical  ( 3MD )  Detailed  Data  Description  r.r  err  i:t  ; 
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small  number  of  maintenance  officers  surveyed  concerning  the 
relevance  and  usefulness  of  the  appraisal  forms  may  produce 
results  of  limited  use.  Finally,  the  lengthy  data  gathering 
period  of  two-and-one-half  weeks  allowed  for  a  number  of 
changes  in  personnel  and  policy  which  may  affect  results. 
Most  of  these  limitations  will  be  resolved  in  the  next 
chapter  on  analysis. 


Chapter  4 
ANALYSIS 

The  analytic  purpose  of  this  study  is  to  develop  a 
subjective  performance  rating  method  that  provides  the 
following: 

1 .  Performance  data  applicable  to  regression  analysis 
as  a  dependent  (Y)  variable. 

2.  Performance  data  that  has  some  apparent  or  actual 
validity  compared  to  existing  perfcrr.ar.ce  measures. 

3.  Performance  da  ca  wi.cn  desirable  statistical 
properties . 

■* .  Performance  data  that  accurately  reflects  the 
organisational  composition. 

The  analyses  of  the  sample  data  that  will  be  discussed  in 
this  chanter  are  based  on  the  above  requirements.  First,  the 
sands  will  be  analyzed  to  determine  if  it  reflects  organi¬ 
zational  composition.  Secondly,  the  sample  data's  statis¬ 
tical  properties  will  be  considered.  Thirdly,  the  quantity 
and  quality  ratings  will  be  compared.  Fourthly,  the 
association  between  existing  performance  measures  (XLEP) 
and  the  sample  quality  ratings  will  be  investigated. 

Finally,  the  maintenance  officer  responses  to  the  opinion 
survey  will  be  summarized. 


Sample  Composition 


All  of  the  technicians  and  supervisors  included  in 
the  sample  were  members  of  the  maintenance  organization  at 
Williams  AFB,  Arizona,  a  U3AF  pilot  training  base.  This 
maintenance  organization  differs  from  most  base  organizations 
in  that  the  avionics  repair  function  is  a  branch  of  the 
Field  Maintenance  Squadron  (FM3)  and  not  a  separate  squadron. 
The  Williams  AFB  organization  structure,  depicted  in  Figures 
3  through  11,  is  otherwise  comparable  to  tne  general  .~.ir 
Force  organization  structure  depicted  ir.  Figures  1  through 
5.  "’he  actual  number  of  line  technicians  was  attracted  from 
the  Maintenance  Management  Information  and  Control  System 
(MM ICS ) ;  these  figures  may  differ  from  authorized  strength 
limits  and,  since  supervisors  are  excluded,  may  not  coincide 
with  squadron  strength  figures. 


The  random  sample  of  lo?  technicians  was  intended  pu¬ 
re-present  the  entire  maintenance  organization.  Eighty 
technicians  were  selected  from  the  Operational  Maintenance 
Squadron  ( QMS ) ,  or  22.6  percent  of  the  squadron.  Eighty- 
nine  technicians  were  selected  from  the  Field  Maintenance 
Squadron  (FM3),  or  13.7  percent  of  the  squadron.  The  grade 
distributions  of  the  sample  closely  parallel  that  found  in 
the  organization  (see  Table  4).  The  FKS  sample  and  squadron 
are  composed  primarily  of  sergeants  and  civilians,  while 
CVS  is  primarily  composed  of  airmen. 

The  relative  representation  of  squairon  branches  in 
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the  sanole  is  depicted  in  Table  5 


The  CMS  sample  closely 
The  ?y.S  sample  is  not  as 


parallels  the  squadron  population, 
representative  due  to  the  large  number  of  work  sections  with 
few  technicians. 

The  OKS  sample  included  eight  day  shift  supervisory 
groups,  eight  swing  shift  groups,  and  two  mid  (i.e.,  grave¬ 
yard)  shift  groups.  The  FMS  sample  included  twelve  day 
shift  groups  and  six  swing  shift  groups.  PiV’.D  had  few  swing 
shift  groups  ar.d  few  mid  shift  groups  with,  three  or  more 
technicians  assigned. 
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Relative  Squadron  Branch  Strengths 
versus  Relative  Sample  Branch  Strengths 


Organization 


o: 


Population  Squadron  Samole 


CMS  Branch®: 


2-3?  Plight 

33 

25 

2-3 r*  Plight 

12? 

36 

:-3?  Inspect. 

22 

6 

I-,'3  Inspect. 

>  ~ 

9 

It::  Inspect. 

o 

a! 

Rep.  and 
Reclamation 

cl 

1? 

Support  Inspect. 

*  »7 

Total 

353 

33  Branches 

Propuls  ion 

1 6  0 

*< 

Faori oa  ti on 

105 

X  ■> 

Aerospace 

Systems 

ui 

23 

.A  :  :  S 

32 

6 

A vi oni cs 

6  3 

1^ 

Total 

i  cnrornation  :'ror 


23 


;■)  Ol 

Sample 


1 

x 

o 


A 


( no tula tier 


6 


oo 


performance  for  OMS  and  FMS  are  presented  in  Fi,?ures  14  and 
15.  According  to  Bradley  (1963:55).  the  discrete  distribu¬ 
tion  case  can  be  treated  in  the  same  way  as  the  continuous 
distribution  case.  Difficulties  arise  in  the  use  of  ton- 
parametric  tests  due  to  the  lame  number  of  equal  values  and 
the  potential  loss  of  test  power. 

The  sample  means  and  standard  deviations  were  calcu¬ 
lated  using:  the  3 MD  Detailed  Data  Description  program  (PZDj. 
This  information  is  included  in  Appendix  a.  This  program 
also  computed  values  for  skewness  and  kurtcsis,  wh: on  will  be 
discussed  following-  an  analysis  of  the  MIDI  data. 


,  .jd,:  pd  id 

The  Maintenance  Standardization  and  Evaluation 
Program  (t’SEP)  data  is  also  included  in  Appendix  D.  Tr.is 
is  the  existing-  information  with  which  quality  ratines  are 
to  be  compared  in  an  attempt  to  establish  some  valid  it;/ 
the  subjective  quality  ratings.  The  ."-'.SEP  data  represents 
two  years  of  personnel  inspections  "/bile  tne  subjective 
ratinrs  represent  supervisors'  appraisals  at  one  particular 
point  in  time.  The  513 ZP  personnel  evaluations  are  base: 
on  compliance  to  equipment  soecif ications :  failure  cf  an 
evaluation  reoresents  either  a  major  safety  disores. noy  or 
the  accumulation  of  more  than  the  ailowei  number  of  miner 
discrepancies  for  a  particular  tas:.  Ir.sr.ections  ta/.u  ins: 
consideration  completed  maintenance  actions  com -1  ? t  y: 

maintenance  insrections  ( CM  I ) ,  and  task  evaluations 


Figure  lk 
FMS  Histograms 

v  Statistics  from  Appendix  S,  3NIDP  analysis;  the  number 
of  technicians  recorded  in  each  interval  is  noted 
at  the  too  of  each  frequency  column) 


Since  some  supervisors  also  work  as  line  Technicians , 
comoleted  supervisory  inspections  (CS1)  and  supervisor 
evaluation  (SEJ  are  also  included.  The  XSE?  score  used  i: 
this  analysis  is  based  on  a  weighted  average  of  all  ir.spec 
tions  subtracted  from  one  so  that  an  individual  FiE?  seer1 
of  1.0  indicates  that  no  discrepancies  were  noted  during 
an  inspection  of  a  particular  technician’s  work. 

The  management  of  the  maintenance  organization  at 
Williams  AFB  requires  that  every  technician  receive  ar. 
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would  be  incomplete  and  unuseabie  as  a  re 
for  the  entire  organization.  It  is  also 


that  of  those  technicians  inspected  who  were  included  in  the 
sample,  QMS  technicians  received  an  avera.se  of  2.7  inspec¬ 
tions  in  the  past  two  years,  compared  to  1.7  inspections  in 
■the  past  two  years  for  FM3  technicians. 

The  histograms  of  the  quality  ratings  and  MSS?  scores 
for  those  technicians  in  the  sample  who  received  inspections 
are  presented  in  Figures  1 6  and  17.  The  data  concerning 
sample  means  and  standard  deviations  was  also  obtained 
using  the  3MD?  program.  A  preliminary  inspection  of  the 
^  s X o£CT*3.m s  n i r** t  ^  nd i  C3."t  0  "t d z  hiirhor  d v ~ ti r 9  nurr:o*jr  cT 

inspections  of  technicians  in  fill  deflated  2M2  .".31?  ratings 
in  comparison  to  FM3  M2Z?  ratings. 

3ased  on  the  histograms  presented  in  this  section, 
the  quality  and  quantity  of  performance  rating  data  will 
now  be  analysed  with  regard  to  important  statistical 
dualities .  Most  significant  is  the  relation  between  tr.o 
histogram  distributions  and  the  normal  distribution. 


Applicability  to  Regression  analysis 


The  maintenance  technician  ratings  were  designed  f cj 
use  in  regression  analysis  as  the  dependent  (Y )  variables. 
One  of  the  assumptions  of  multiple  linear  regression  is 
that  the  observations  for  the  dependent  variables  are 
independent  and  drawn  from  a  normal  dis  oriouticn  (Meter  ana 
.Vasserman;  normality  is  also  required  for  most 

parametric  tests  that  could  be  used  to  compare  the  cuanti  v. 


(Statistics  from  Appendix  J,  Correlation  Ana  lye 
number  of  technicians  recorded  in  each  int-.-rva. 
noted  at  the  top  of  each  frequency  column 


( n=63 5  xs.700;  s=.263) 


Figure  17 

OMS  duality  Ratings  and  M3iP 

(Statistics  from  Appendix  J,  Correlation  Analysi 
number  of  technicians  recorded  in  each  interval 
noted  at  the  top  of  each  frequency  column) 


73 


and  quality  histograms. 

Tests  which  can  be  used  to  test  the  normality  assump¬ 
tion  include  the  Kolmogcrov-Smirncv  and  Chi-Square  soodness- 
of-fit  tests.  The  Kolmogorov-Smirnov  test  is  not  applicable 
in  this  case,  as  population  parameter  estimators  cannot  be 
specified  in  advance  of  testing  ( Lapin ; 1 9?3 : ^26 ) .  The  Chi- 
Square  test  does  allow  the  use  of  sample  estimators  for 
testing  population  parameters.  The  Chi-Square  test,  however, 
is  more  difficult  to  interpret  and  the  probability  of 
acc3ptin.r  a  false  null  hypothesis  (Type  II  error }  ::  not 
well  defined  (there  are  several  ways  in  which  trie  null 
hypothesis  could  be  false).  Calculations  for  the  Chi -Square 
?oodness-of-f it  test  for  the  FM3  quality  of  performance  data 
are  included  in  Appendix  F.  A  summary  of  the  Chi-Square 
results  for  all  of  the  sample  data  is  included  in  Table  7. 

The  small  Type  I  error  significance  levels  ;ai:.hu' 
indicate  poor  fits  of  all  the  data  to  the  normal  distribution. 
For  instance,  the  .01  level  of  significance  for  the  .MS 
quality  data  indicates  that  if  a  distribution  were  normal 
with  a  mean  of  7.2  and  a  standard  deviation  of  2.C,  then 
there  would  be  a  1  percent  chance  that  a  sample  could  be 
obtained  from  this  distribution  yielding  a  Chi-Square  value 
eoual  to  or  treater  than  Similar  interpretations  of 

the  remaining  Tvpe  I  error  levels  indicate  that  all  t no 
sample  data  have  relatively  low  probabilities  of  bein’  from 
normal  distributions. 

sample  data  to  normal 


The  relatively  poor  fit  of  the 
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Table  7 

Chi-Square  Normal  Distribution 
Goodr.ess-of-Fit  Tests 


Saraole 

Distribution 

Fit  to  N(5?;S2) 

X2 

-va 

Type 

Risk 

n 

CMS  ; 

Quality 

N(7.2;2.02) 

6.96 

1 

.01 

30 

CMS  ; 

Quantity 

N(6.6;2.12) 

4.60 

1 

.  02 

OKS  ; 

MSEP 

N(  .70; ,262) 

9.79 

-> 

.001 

o3 

OKS  ; 

Rev.  MSEPC 

N(  .66  ;  . 242 ) 

2.73 

9 

*w 

.25 

52 

r  MS ; 

Quality 

N  (  7 . 3  J  2 . 1 2 ) 

1 7  .°8d 

u 

.  J01 

FMS  ; 

Quantity 

N  ( 6 . 6  ;  2 , 1. 2  ) 

6.41 

2 

;  > 

-  ^  » 

M3E? 

01  .  o  n2  ' 

.  •  ■  7  J.  #  •  'W 

"  •  '*  0  i 

t  e  i  t 

c  calcul 

ate 

aV=degrees  of  freedom. 

oe  I  risk  is  tne  alpha  level  of  significance. 
°Rev.  M3EF  includes  inspection  results  for 
technicians  with  two  or  more  ins  sections. 

:13ee  Appendix  :  for  Chi-Square  calculations , 


distributions,  fortunately,  has  minimal  effect  on  the 
multi  ole  linear  regression  model  with  which  the  data  will 
be  used.  The  regression  coefficient  parameter  estimators 
will  remain,  unbiased  and  consistent  thongs  not  hig.nl, y 
efficient.  If  the  lack  of  normality  is  significant  enough, 
the  error  terms  (residuals)  of  the  regression  model  may  not 
have  constant  variance  (a  condition  defined  as  neterosce- 
dasticity).  This  condition  can  best  be  investigated  after 
a  preliminary  regression  model  is  developed  and,  if  r.cr.- 
constar.t  variance  results,  a  transformation  of  the  conform¬ 
ance  data  can  be  made  to  correct  the  problem.  For  instance, 
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if  the  error  term  standard  deviation  is  proportional  to  toe 
square  of  the  factor  level  mean,  the  reciprocal  transforma¬ 
tion  stabilizes  variances  (.deter  and  .Vasserman;  1  ^74 :  5-7 )  . 

It  is  often  the  case  that  the  same  transformation  which  hel; 
stabilize  the  variance  also  helps  normalize  error  terms. 
Meter  and  '.Vasserman  (1074:123)  state  that; 


It  is  therefore  desirable  that  the  transformation  for 
stabilizing  the  error  variances  be  utilized  first, 
and  then  the  residuals  studied  to  see  if  serious 
decartures  from  normality  are  still  orese.nt. 


It  is  thus  apparent  that  normality  of  the  sansie 
distributions  is  not  a  serious  initial  concern  fcr  re  tress  ic: 
analysis.  Mon-normality  does  have  serious  effects  or.  tes  ty 
which  can  differentiate  between  the  mean  and  variance  of  to  - 
sample  distributions.  The  skewed  nature  of  the  distribution, 
appears  to  be  the  particular  quality  lead i nr  to  tr.e  poor 
comparison  of  the  samcle  distributions  tc  normal  distribu¬ 
tions.  All  of  the  sample  distributions  are  shewed  as  ir.ti- 
cated  by  the  data  in  Table  6.  The  effect  of  skewed  departure 


Table  3  • 

Skewness  and  Kurtosis  of  Sample  Data 


0» 
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iS 

Quality 

r  .ys 

quantity  Quality 

Skewness 
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1  .*±c  • 
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.A6t 
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Qdn 
•  y  . 

1  . 1  -  t 

(data 

from  Bf-’D?  analysis,  npper.dix  S 

from  normality  on  parametric  tests  to  compare  mean: 
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significant  for  small  sample  sizes  but  insignificant  for 
the  large  sample  sizes  used  in  this  study  (Davies ; 195° : 55 ) , 
Most  parametric  tests  for  the  comparison  of  means,  however, 
require  equal  sample  variances.  And  tests  that  determine 
variance  equality  are  significantly  affected  by  shewed 
sample  distributions.  The  sample  distributions  all,  with 
one  exception,  have  equal  variances  with  a  Type  1  risk  of 
five  percent  (Table  9).  A  calculation  to  test  for  the 
equality  of  variances  for  the  F.MD  quantity  ar.d  quality  data 
is  included  in  Appendix  H.  However,  according  to  Davies 

,  with  the  degree  of  skewness  and  kurtosis  exhib¬ 
ited  by  this  data  the  Type  I  risk  becomes  closer  to  5 .  b 
percent  (see  Appendix  G) .  This  discrepancy  does  not  decrease 
as  sample  size  is  increased. 

In  summary,  it  is  apparent  that  the  sample  distribu¬ 
tions  exhibit  marginal  normality  due  to  skewness.  As  tree 
normality  assumption  is  not  critical  in  the  use  of  the 
multiple  linear  regression  model,  this  data  is  applicable  to 
regression  but  should  be  used  with  caution.  In  particular, 
the  regression  residuals  should  be  carefully  inspected  for 
heteroscedasticity  and,  if  necessary,  remedial  transformation 
should  be  utilized.  The  use  of  parametric  tests  for  the 
comparison  of  sample  distributions  is  ruestionabie  due  to 


the  skewed  nature  of  the  distributions.  In  particular,  it  is 
difficult  to  test  for  the  equality  of  variances  with  a  Type 
I  error  of  less  than  seven  percent,  while  variance  equality 
is  a  requirement  for  most  parametric  tests.  Mon-parametric 
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Table  9 

Comparison  of  Variance  Equality 
(see  Appendix  H) 


Variance  1 

Variance  2 

F 

vi  .^2a 

Type  I 
Sisx 

FMS  Quant . 
(4.25) 

FMS  Qual. 
(4.26) 

.9966 

66 , 66 

•  -O 

FMS  Quant. 
(3.91) 

FMS  MSEP 
(2.57) 

1.520 

69.6  b 

.05 

FNIS  Quant . 
(6.20) 

FMS  S!v!SSPb 
(3.32) 

1 .66? 

35.35 

CP! 3  Quant . 
(4.45) 

CMS  Qual. 
(3.33) 

1  . 1 

79 , 79 

£ 

QMS  Quai. 
(3.06) 

CMS  MS 2? 

(6.93; 

,44u 

z'  -s  .  • 

r"  < 

> 

‘‘  1  W  « 

QMS  Qual. 
(3.35) 

CMS  RM3E?b 
(5.52) 

.  0O6 

51,51 

•  1  3 

CVS  Quant. 
(-.26) 

FMS  Quant . 
(4,45) 

•  r  ^  5 

66 , 79 

•  sJZ 

OMS  Qual. 

(4.25) 

F.V3  Qual. 
(3.33) 

1.109 

66,79 

aV=dearees  of  freedom. 


~RM3EF  includes  inspection  results  for  technicians 
with  two  or  more  inspections. 


CN.S.  (Not  Sianif icant) .  The  null  hypothesis  is 
rejected  and  the  variances  cannot  be  accepted  as 
equal. 
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tests  may  be  applicable  if  the  sample  histogram  distributions 
can  be  used  with  this  type  of  test. 


Rating  Comparisons 


The  next  area  of  interest  is  tne  comparison  of  the 
quantity  and  quality  ratings  within  and  between  squadrons  to 
determine  if  any  differences  exist  for  the  ratine  distribu¬ 
tions.  Due  to  the  non-normal  distributions,  cne  tests  for 
differences  will  be  conservative  and  will  either  be  non- 
sarsmetrio  or  parametric,  with  variance  eouaiitv  net 

r*  '  v* 

«  *  *-  «■  ^  * 

According  t,o  3rc.ilov  ( 1963 }  ,  Gioocns  (1975 }  ,  ^.r.c 
Hollander  and  Wolfe  (1973) ,  most  r.cn- parametric  tests  are 
based  on  ranking  procedures.  Validity  is  seriously  affected 
bv  a  largo-  number  of  ties  in  the  sample  data,  and  that  occurs 
for  this  study  data.  One  non -parametric  test  that  is  appli¬ 
cable  for  the  comparison  of  sample  distributions  is  tr.e 
Kolmogorov-Dmirnov  two  sample  test  (Gibbons  ?  1  t  Z  52  '  .  For 

this  test  only  two  assumptions  are  needed  relative  to  the 
study  data:  first,  that  the  quantity  and  quality  ratings 
for  each  technician  by  a  single  supervisor  are  independent, 
and,  secondly,  that  the  quantity  and  quality  of  performance 
data  be  considered  as  continuous  variables.  Thus  tne 
probability  i?)  values  for  tnis  test  should  ce  considered  to 
ce  conservative  (Gibson.  ?  1  976  j  253) .  The  Koimogorov-Dmirn  ;v 
test  results  summarized  in  Table  10  indicate  that  tne 


luontity  and  quality  rating  distributions  for  each  squadron 
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are  different;  however,  the  test  does  not  specify  in  v/hat  way 
the  distributions  are  different. 

To  compare  means  for  the  different  sample  distribution 
the  3ehrens-F isher  approximate  "t"  test  was  used.  This  test 
does  not  require  equal  variances  or  equal  sample  size, 
although  the  formulation  and  solution  are  somewhat  controver¬ 
sial  according  to  Dudewicz  (1976:311 )•  The  results  of  the 
approximate  " t"  tests  are  summarized  in  Table  10.  These 
results  indicate  a  significant  difference  oetv/een  the  means 


of  the  quantify  and  quality  distributions  within  me  sou a 


:.o 


-I  i  ••  <U- 


nee,  however,  is  note 


the  d  istri'cu  tio.ns  between  the  squadrons. 


a  -  -  til  -  d*.*. 


the  means  for  squadron  quality  of  performance  iistricut:  :r.  ; 
and  M3:  ?  acrraisals  indicates  a  difference  for  ?l  C  and  no 
difference  for  IMS.  In  the  cases  where  significant  i if f-r- 
ences  were  found ,  the  crccaoility  of  erroneously  a  peer  :i -g 
the  alternate  hypothesis  (Tyre  II  error,'  for  the  i i ffmenco 
in  means  tested  ranged  from  zero  percent  to  thirty  percent . 
The  higher  Tvpe  IT  errors  were  *'ouno  on  tne  t  ;sts  oan:ar:n~ 
the  means  of  the  quality  and  quantity  per formar.ee  qistrizu- 
tions  within  squadrons.  It  appears  that  the  Type  II  error 
size  is  influenced  by  the  histogram  cell  width  pro iucirr 
a  large  sample  variance. 

Ir.  summary,  conservative  tests  indicate  significant 
differences  in  the  quantity  and  quality  distributions  .vim: 
the  CM3  ar.i  the  FM5  squadrons,  out  not  between  souadr  :nz . 
This  significant  difference  ir.  the  distributions  does  *■'  s  f 
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Table  10 

Comparison  of  .leans 


( see 

Appendix 
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Mean 

Mean 

Testa 

3 tat 

vc 

Sig.e  Type 
Type  I  II 
Risk  Risk 

FKS 

Quant .(6.6) 

FKS 

Qual. 

(7.3) 

X-S 
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.2C(D  ) 

2 . 2 ( tT 
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.  0  5  (  P )  - 

.0 5(o0  .30(B) 

QMS 

Quant . ( 6 . 57 ) 
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K-S 
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1  . 9( tT 

155 
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.10W  .13-' 53 -• 

CMS 

Quant .(6.5?) 

FMS 

Quant 

.  1 6 . 6 ) 
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.57(tj 
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'  T>  'Q 

•j  _  ^ 

i  .  -  V  t .! 

1  £6 

Mor.-d  v  c<,  * 

■' MS 
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»  4  \Jk 

»*.*»! 

.  .o 

t  7  ’ 

.  •  -*  ! 

r-,  *71 

.  5  C  ( t ; 

1 1 1 

None  is*  - 

FM3 

*.  U  -i  .  ^  i  •  y  1 

uv<-’d 

4.  .*‘.0 

MS  2? 

\  0  .  1  1 

3-F 

5 . 6  ( t ; 

151 

.05 .of;  0  i/d, 

a?9sts  ;  K-3,  Kolnogorov-S.mirnov  l\ on- Par an  e  tr i  c  two 
sample  test;  B-F,  Behrer.s-r  isr.er  Approximate  "t"  test 

^Calculated  test  statistics;  D_  for  the  holmogorov- 
Smirr.ov  test,  t  for  the  Behrens-?  isr.-r  .approximate 
"t”  test. 

C-V=d eftrees  of  freedom. 

^The  MSEP  means  were  multiolied  by  ten  for  these  tests. 

eSisnif leant  Type  I  risk  levels  were  reported  when  a 
significant  difference  in  means  was  found. 
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necessarily  mean  that  individuals  will  have  significantly 
different  quantity  and  quality  ratings.  The  significant 
difference  does  indicate  that  both  distributions  should  be 
used  in  the  regression  analysis.  The  association  between 
ratings  and  the  validity  of  the  quality  of  terformar.ee  rat¬ 
ines  will  be  considered  next. 

Rating  Associations 

To  determine  the  degree  of  association  between  cuality 
ar.d  ouantity  ratings  and  between  quantity  ratings  and 
data  requires  some  sort  of  association  measure.  I.’cn- 
carame trio  measures  would  be  preferred  in  this  case  since  tn.-; 
parametric  measure  or  linear  correlation  coefficient  assumes 
that  the  distributions  are  drawn  from  a  civariate  normal 
distribution  (Meter  and  .Vasserman;  l?'7^  :  30c/ .  2in.ee  ail  the 
sample  distributions  are  shewed  and  provide  poor  fits  to  t.-ie 
normal  distribution,  it  is  unlikely  that  the  basic  as sanc¬ 
tion  for  the  use  of  the  correlation  coefficient  car.  be  met. 
However,  non-parametr ic  association  measures  require 
that  the  lata  be  ranked,  whereas  the  samcle  data  in  this 
study  contains  too  many  ties  for  such  measures  to  be  valid. 

Since  non- carame trie  measures  cannot  be  used,  it  is 
necessary  to  use  the  Pearson  product-moment  correlation  coef¬ 
ficient  as  a  descriptive  measure.  In  this  regard,  Freund 
(1377? 32^)  states  the  following: 

Mote  that  the  sample  correlation  coefficient  r  is 
often  used  to  measure  the  strength  of  a  linear 
relationship  exhibited  by  sample  data  even  if  the 
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data  do  not  come  from  a  bivariate  normal  roculaaior.. 
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Gibbons  (1976: 33q)  indicates  thaJ 
interval  scale,  as  is  used  in  the  ratings  under  saucy,  to 
subjectively  measure  performance  also  allows  for  a he  use  of 
the  Pearson  product-moment  correlation  coefficient  as  a 
descriptive  measure. 

In  this  study  all  correlation  coefficients  were 
calculated  usins'  the  3KDP  program .  The  results  of  these 
comparisons  are  summarised  in  Table  11.  The  significance 

igya  1  ^  r-  ^  0  T  t  0  "1  1.  ^  *3  2  ''  f*  \  3  a  3  3  ^  0  ;1  jh  3  3  ^  g  2  ~  33  3  3  a  >•*  *3  '  o 


o  ,-7 '  ' 


A.  ^  O 


:  r.  e  c  o  * 

indicate  non-zero  coefficients.  These  results  sr - 
limits i  by  the  fast  that  it  car.ncs  be  shown  tr.aa  an:  sample 
distributions  are  drawn  from  a  bivariate  normal  .1  istriruti  ar 
f  or 

correlation  coefficien 
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Summary  of  Correlation  losfficer.ts  ;r 
for  Cuar.tity/Cual ity  Associations  and 
Guality/MSZP  Associations 
(see  Apoendix  J) 
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Interpretation  of  the  correlation  coefficients  as 
descriptive  statistics  indicates  that  quantity  and  quality 
of  performance  ratings  for  individual  technicians  are  highly 
correlated.  There  does  not  appear  to  be  any  linear  corre¬ 
lation,  however,  between  technician  quality  ratings  and 
MSS?  inspections.  If,  however,  only  these  individuals  who 
have  had  two  or  more  inspections  in  the  previous  two  years 
are  included  (revised  K3EP),  a  significant  correlation 
results  for  C"«'.3  personnel.  However,  there  is  still  no 
correlation  between  ?XS  ratings  and  .'■‘31?  data. 


Z.  z rin 3  3. s  "t  ri  2  c  2  2  ri  2 '  sr'  1  3  u 2 1. "  \y  *,  ■  ■ 
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correlation  exists  between  quality  ratings  and  ..Si?  lata. 
These  correlation  statistics  should  be  considered  as 


descriptive  statistics  only. 

'nhis  section  concludes  the  analysis  of 
data  from  the  performance  ratings.  text,  the 
these  ratines  will  be  considered  based  cr.  the 
of  a  few  maintenance  officers. 
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Opinion  Survey  Analysis 


The  opinions  of  a  limited  number  of  maintenance  offi¬ 
cers  were  solicited  concerning  the  usefulness  of  the  per¬ 
formance  ratings  and  the  relative  importance  of  quantity  arm 
quality  of  performance.  The  survey  questions  and  letter  :f 
transmittal  are  contained  in  Appendix  K.  Survey  cues* i m- 
naires  were  mailed  to  eight  maintenance  officers  one  r.*s  r 
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knows  personally}  replies  were  received  from  four,  as  the 
remaining  four  had  moved  and  left  no  forwarding  addresses. 
The  Deputy  Commander  for  maintenance  at  Williams  AF3  also 
responded  to  the  questionnaire.  In  all,  one  Major,  one 
Lieutenant  Colonel,  and  three  Colonels  responded.  The 
responses  are  compiled  in  Appendix  L  and  represent  three 
maintenance  organization  commanders,  one  squadron  commander, 
and  one  chief  of  maintenance  quality  control. 

The  consensus  opinion  of  these  officers  is  that 
individual  performance  is  important  to  the  organization  ar.d 
is  not  limited  to  line  technicians.  The  officers  surveyed 
felt  that  although  the  rating  forms  designed  for  this 
study  might  be  useful,  they  might  not  be  valid  if  they  were 
used  as  APRs.  No  one,  however,  was  able  to  provide  a  more 
appropriate  way  to  measure  performance.  One  significant 
limitation  to  the  eraaaticns  of  quality  used  or.  the  rating 
form  was  noted:  many  maintenance  tasks  require  only  compli¬ 
ance,  with  no  gradations  to  the  quality  of  work  required. 

All  of  the  officers  considered  quality  and  quantity 
of  performance  to  be  important,  although  in  some  cases  they 
felt  that  one  cannot  be  considered  independently  of  the 
other.  All  considered  quality  of  performance  to  be  abso¬ 
lutely  overriding  in  importance  compared  to  quantity  of 
performance,  the  only  exception  to  this  being  in  times  of 
critical  emergencies,  such  as  wartime. 

This  very  small  sample  of  opinions  may  not  be  repre¬ 
sentative  of  management  viewpoints  on  the  subject  for  the 
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entire  Air  Force.  The  author  did  feel,  however,  that  some 
feedback  from  those  who  might  be  using  this  information 
would  be  useful.  It  was  somewhat  surprising  to  discover 
that  quality  of  performance  was  considered  to  be  much  more 
important  than  quantity  by  all  of  the  officers  surveyed. 

Summary 

The  results  and  analyses  of  this  study  have  been 
presented  in  this  chapter.  Considerations  included  study 
of  the  sample  to  determine  if  the  entire  organization  was 
well  recresented  and  a  careful  analysis  of  the  samcie  data's 
statistical  properties.  In  particular,  the  data  was 
analyzed  for  applicability  to  regression  analysis.  The 
rating  distributions  were  also  tested  for  goodness-of-f it 
to  normal  distributions .  Quantity  and  quality  of  perform¬ 
ance  ratings  were  also  compared  and  an  attempt  was  made 
to  validate  the  quality  ratings  using  :.’3E?  results.  Finally, 
maintenance  officer  opinions  concerning  quantity  and  quality 
of  performance  were  summarized.  All  of  these  analyses  will 
be  discussed  and  interpreted  in  the  next  chapter. 
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Chapter  5 


DISCUSSION 

The  purpose  of  this  study  is  to  determine  the 
following t 

1.  V/hat  is  the  best  research  method  for  evaluating 
or  mea  curing  performance  of  aircraft  maintenance 
technicians  in  the  United  States  Air  Force? 

2.  Does  this  method  for  evaluating  or  measuring 
performance  provide  useful  and  valid  statistical 
data? 

The  discussion  of  the  findings  will  therefore  cover 
the  performance  apcraisal  method  selected  and  the  statistical 
evaluation  of  the  appraisal  method.  This  discussion  may  lead 
to  findings  that  revise  the  existing  body  of  Knowledge 
concerning  subjective  performance  appraisals  or  improve 
research  methodologies. 

The  Performance  Appraisal  Method 

The  recommended  performance  aopraisal  method 
(Appendices  B  and  C)  was  developed  through  a  review  of  the 
literature  on  the  subject.  The  literature  review  was 
necessary  because  existing  rating  schemes  are  either  not 
apolicable  to  statistical  analysis,  highly  inflated,  or 
unuseable  for  research.  Airmen  Performance  Ratings  (APR s; 
and  civil  service  t.^erit  Ratings  are  used  for  administrative 


purposes  of  promotion  and  demotion  and  seldom  reflect  job 
performance  alone.  These  ratings  also  tend  to  be  inflated. 
Callander  (1979t^)  reports  that  airmen  have  average  APR 
scores  of  8.5  on  a  9.0  scale,  an  inflated  rating  that  is  not 
useable  for  research.  Proficiency  ratings  are  either  paper 
and  pencil  theory  tests  (Skill  Knowledge  Tests)  or  MSEP 
evaluations.  For  instance,  only  approximately  77#  of  the 
technicians  at  the  test  base,  7/illiams  AFB,  had  received 
MSSP  appraisals  due  to  the  high  turnover  rate  of  personnel. 
Furthermore,  SKT  evaluations  are  not  applicable  to  civil 
service  maintenance  technicians,  who  make  up  17#  of  the  line 
technicians  at  7/illiar.s  AFB.  Since  existing  performance 
data  was  not  aoplicable,  a  new  rating  scheme  was  developed 
based  on  a  review  of  the  literature  and  the  restrictions 
imposed  by  the  maintenance  organization. 

The  size  of  the  Air  Force  maintenance  organization 
required  a  measurement  scheme  applicable  to  civilian  and 
military  technicians  of  all  races  and  sexes  performing  many 
tasks  ranging  from  servicing  aircraft  to  repairing  missile 
guidance  systems.  It  was  thus  apparent  that  the  organiza¬ 
tion  size  and  structure  restricted  useable  performance 
measures  to  general  criteria,  such  as  quantity  and  quality 
of  performance  based  on  subjective  appraisals  by  supervisors. 
The  measurement  of  quantity  and  quality  of  performance 
presented  a  difficult  problem. 
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Of  the  appraisal  methods  reviewed,  the  only  applicable 
methods  appeared  to  be  straight  ranking  and  the  use  of  rat¬ 
ing  scales.  Both  methods  are  based  on  subjective  appraisals 
by  immediate  supervisors.  Barrett  (1966:71)  reported  that 
the  use  of  graphic  scales  following  a  forced  ranking  proce¬ 
dure  increases  the  accuracy  of  the  ratings.  This  method  was 
adapted  for  this  study  and  should  have,  in  theory,  provided 
rating  scale  performance  values  which  were  normally  distrib¬ 
uted  . 

The  actual  rating  scale  format  was  designed  to 
minimize  errors  of  leniency  (see  Appendices  3  and  C).  These 
recommended  rating  forms  were  aspropriate  based  on  a  review 
of  the  literature  concerning  appraisals  and  on  the  nature  of 
the  Air  Force  maintenance  organization.  The  suggested  rating 
forms  have  face  validity  if  previous  research  conclusions 
are  accepted. 

The  following  sections  discuss  the  statistical 
dualities  that  resulted  from  an  actual  test  of  the  rating 
forms  within  one  Air  Force  aircraft  maintenance  organization. 

In  addition,  the  opinions  on  the  usefulness  of  the  rating 
forms  which  were  collected  from  several  maintenance  officers 
will  be  discussed. 

Evaluation  of  the  Rating  Method 

The  evaluation  of  the  rating  method  was  conducted  at 
Williams  AF3,  Arizona.  This  is  a  relatively  small  ciiot  train¬ 
ing  base  utilizing  jet  aircraft  which  are  mechanically  simile 
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compared  to  other  aircraft  in  the  Air  Force  inventory--the 
T-37  and  T-33  jet  trainers.  Thus,  the  conclusions  of  this 
study  may  be  limited  by  the  restriction  of  the  test  to  one 
training  base.  One  prerequisite  of  the  rating  form  tested 
was  its  applicability  to  both  military  and  civilian  mainte¬ 
nance  technicians.  Both  types  of  technicians  were  included 
in  the  sample  drawn  from  the  maintenance  organization. 

Test  Sampling  Procedures 


The  random  sampling  procedure  resulted  in  represent¬ 
ative  proportions  of  civilian  and  military  technicians.  Sat¬ 
ina  of  the  technicians  proved  to  be  of  no  difficulty  tc  any 
of  the  supervisors.  It  is  significant  to  note  that  a  major¬ 
ity  of  technicians  within  the  FMS  squadron  were  civilians  or 
had  attained  the  rank  of  sergeant  or  better.  The  C'.'.d  squad¬ 
ron,  conversely,  was  primarily  military  with  the  majority 
being  airmen.  It  can  generally  be  concluded  that  the  civil¬ 
ians  and  the  Air  Force  personnel  with  sergeant  rank  and 
above  have  much  more  work  experience  than  the  airmen. 

The  sampling  procedure  also  provided  a  representative 
proportion  of  technicians  from  each  of  the  CMS  branches.  The 
FX3  branches  were  not, however,  proportionately  represented 
by  the  sample.  This  uneven  distribution  was  crimarily  due  to 
the  large  number  of  independent  sections  or  shops  in  F...J 
having  very  few  peoole;  the  swing  and  mid  shifts  in  the  more 
heavily  populated  shops  were  excluded,  as  fewer  than  three 
technicians  worked  for  a  supervisor  at  any  one  time. 
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I  it  general,  the  sampling  procedure  was  extremely 
successful  in  providing  a  representative  segment  of  the 
organization.  By  randomly  selecting  supervisory  groups 
rather  than  individual  technicians,  the  ratine  process  was 
expedited.  And  by  randomly  selecting  individuals  from  the 
supervisory  groups,  the  researchers  controlled  who  would  be 
rated  and  avoided  any  bias  had  the  supervisors  selected  the 
technicians  themselves. 

Rating  Distributions 

The  skewed  nature  of  the  quantity  and  quality  histo¬ 
gram  distributions  (Figures  1^  and  15)  was  not  expected. 

'The  rating  forms  were  designed  to  produce  symmetric,  normal 
distributions  that  reflect  the  relative  performance  of 
technicians.  There  are  two  possible  explanations  for  the 
consistently  high,  skewed  quality  ratings:  (1)  the  super¬ 
visors  as  a  whole  may  be  extremely  lenient  or  (2)  the  super¬ 
visors  may  be  interested  in  consistent  compliance  and  no 
more.  There  is  no  way  to  differentiate  between  these  two 
exolanations .  It  is  possible,  however,  that  many  mainte¬ 
nance  tasks  are  designed  to  be  simple  so  that  the  technician, 
according  to  one  survey  response  (npper.iix  K),  "either  can 
do  the  job,  or  he  can’t  with  no  gradations  in  quality."  If 
supervisors  rate  technicians  in  this  light,  then  the  quality 
histograms  would  be  expected  to  display  the  skewed  nature 
observed  in  the  test  data.  It  would  oe  interesting  to 
analyze  technician  tasks  to  discover  if  quality  gradations 
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exist  or  if  tasks  are  considered  to  be  done  or  not  done 
without  quality  gradations.  At  any  rate,  these  quality  of 
performance  distributions  were  considered  useful  for  lack  of 
any  other  better  performance  measures. 

The  MSEP  inspection  distributions  are  more  severly 
skewed  than  the  research  sample  quality  ratings  (Figures  16 
and  17)  and  omit  some  20#  of  the  line  technicians.  Sauer, 
Campbell,  and  Potter  (1977)  observed  the  same  difficulty  with 
KSE?  data  and  also  determined  that  the  data  was  not  applica¬ 
ble  as  a  measure  of  performance  in  constructing  mathematical 
models . 

The  quantity  of  performance  ratings  exhibited 
distributions  which  were  less  skewed  than  the  quality  ratings, 
and  the  quantity  distributions  were  significantly  different. 
The  difference  in  the  distributions  was  established  using 
the  Koimogorov-Smirnov  test  ( A poena ix  I).  Despite  the 
differences  between  quantity  and  quality,  the  quantity  rat¬ 
ings  were  still  not  symmetric  and  still  had  mean  values 
greater  than  the  scale  mid-point.  The  difference  in  the 
ratings  is  worthy  of  note,  but  the  skewed  distributions 
could  also  be  the  result  of  a  halo  effect  from  the  quality 
ratings  or,  again,  rater  leniency.  There  is  no  way  to 
differentiate  between  these  influences.  In  fact,  there  is 
no  existing  data  with  which  to  compare  quantity  ratings. 

This  information  may  give  an  added  dimension  to  technician 
performance  that  is  not  currently  considered  but  may  be 
important  in  contingency  situations. 
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Rating  Distributions  and  Normality 


The  skewed  nature  of  all  the  sample  distributions 
makes  it  unlikely  that  they  represent  normal  copulations 
(Table  7).  This  does  not  create  a  serious  problem  for 
repression  analysis  but  does  restrict  the  types  of  tests 
that  can  be  used  for  comparing  distributions. 

The  skewed  distributions  should  be  used  with  caution 
in  any  regression  analysis.  It  is  possible  that  the  error 
terms  of  the  regression  model  may  not  have  constant  variance 
(heteroscedasticity )  as  a  result  of  skewness,  .according  to 
Neter  and  './ asserman  this  condition  can  best  be 

investigated  after  a  preliminary  regression  model  is  devel¬ 
oped  and,  if  necessary,  a  transformation  of  the  performance 
data  can  be  made  to  correct  the  problem.  It  is  often  the 
case  that  such  a  transformation  also  helps  to  normalize  the 
data. 

The  skewed  distributions  do  present  problems  with 
tests  that  compare  distribution  means  or  variances.  In 
particular,  the  equality  of  variances  is  questionable  in 
this  situation.  This  means  that  non-parametric  tests  or 
tests  which  do  not  require  equal  variances  should  be  used  to 
compare  distributions. 
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The  skewed  distributions  made  it  difficult  to 


determine  if  any  difference  existed  between  quantity  and 
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quality  ratings.  The  histogram  distributions  were  based  on 


larger  cell  intervals  than  initially  intended  due  to  the 
grouping  of  ratings  around  scale  numbers.  This,  in  turn, 
produced  large  sample  variances  (s^=4.0)  and  many  ratings  in 
separate  distributions  with  the  same  values.  As  a  result, 
parametric  tests  for  comparing  means  had  reduced  power, 
while  most  non-parametric  tests  for  comparing  means  were  net 
useable  due  to  the  tied  rating  values. 

To  overcome  these  difficulties,  conservative  tests 
were  used.  The  Kolmogorov-Smirnov  non-parametric  test 
showed  that  the  quality  and  quantity  rating  distributions 
were  different  for  both  squadrons.  fhe  Behrens-Fisher 
approximate  "t"  test  indicated  that  the  means  of  the  quantity 
and  quality  distributions  for  OKS  and  FKS  differed.  The 
quantity  and  quality  distribution  means  for  FKS  were  accepted 
as  different  with  a  Type  I  risk  of  1 0-5  and  a  Type  II  risk  of 
133.  The  quantity  and  quality  distribution  means  for  OKS 
were  accepted  as  different  with  a  Type  I  risk  of  5:'?  and  a 
Type  II  risk  of  3'^.  These  differences  indicate  that 
quantity  and  quality  ratings  should  be  considered  separately 
in  evaluating  technician  performance.  The  differences  do 
not  indicate  that  technicians’  ratings  on  one  factor  will  not 
be  reflected  in  the  other  factor. 

The  difficulties  encountered  in  evaluating  the  rat¬ 
ings  miffht  be  overcome  by  revising  the  scale  format  to 
eliminate  the  bunching  of  ratine's  around  scale  numbers  and 
by  thus  attempting  to  force  normal  distributions.  For  now, 
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however,  the  data  is  useful  in  regression  analysis  with 
certain  restrictions  and  the  results  do  differentiate  between 
quantity  and  quality  ratings.  V/hether  or  not  the  ratings  are 
valid  when  compared  with  existing  data  will  be  discussed  in 
the  next  section. 


Validity  and  Association 

If  the  theoretical  basis  of  the  tests  is  sound  and  if 
the  maintenance  officers  surveyed  are  to  be  believed,  the 
performance  ratings  do  have  face  validity.  It  is  difficult, 
however,  to  find  any  agreement  between  the  quality  ratings 
and  existing  V3ZP  data. 

No  comparative  data  exists  for  the  quantity  ratings. 

It  is  interesting  to  note  that  supervisors  tended  to  associate 
hish  quantity  and  high  quality  ratings  for  technicians.  This 
could  be  due  to  a  halo  effect  or  could  simply  be  based  on 
the  opinion  that  quantity  and  quality  of  performance  are 
related . 

Attempting  to  validate  the  quality  ratings  using  VSE? 
data  was  not  very  successful.  The  limitations  of  the  .VIE? 
data  were  particularly  difficult  to  deal  with,  .-ore  than  201 
of  the  technicians  received  no  inspections  due  to  the  rapid 
turnover  of  personnel  and  in  soite  of  a  local  policy  of 
administer ing  an  inspection  every  eighteen  months.  The 
resulting  VSZF  data,  even  after  an  attempt  was  made  to 
interpret  the  raw  data  in  relation  to  performance  baselines, 
proved  to  be  even  more  skewed  and  unuseable  than  the  rating 


distributions.  It  should  bo  noted  that  the  y.S£F  data  covers 
two  years  and  is  utilized  to  determine  section  or  branch 
trends  and  not  to  reflect  individual  performance.  y.SSF 
inspections  also  emphasize  compliance  and  safety  and  not 
gradations  of  quality  performance.  The  MSEF  data  was  still 
used — in  spite  of  these  linitations--in  an  attempt'  to  vali¬ 
date  quality  ratings,  as  it  was  the  only  existing  record 
available.  The  correlation  analysis  revealed  no  significant 
relation  between  NSEP  and  the  quality  ratings. 

When  only  technicians  vith  two  or  mere  MSEF  inspec¬ 
tions  were  included  in  the  correlation  analysis,  a  low  but 
significant  nor.-zerc  (.3  68;  correlation  was  noted  for  SAM, 
while  no  correlation  at  all  existed  for  FAS.  The  revised 
0il3  KS2?  data  in  fact  exhibited  a  normal  distribution  and  a 
mean  value  somewhat  lower  than  the  CMS  quality  ratings ;  thus, 
if  such  data  existed  for  all  OXS  personnel,  it  might  be  a 
superior  measure  of  performance  to  the  ratings,  As  for  the 
FMS  MSS?  data,  it  may  have  remained  highly  skewed  due  to 
the  difficulty  of  inspecting  highly  technical  tasks,  the 
relatively  fewer  average  inspections,  or  the  overall  high 
experience  level  within  the  squadron.  At  any  rate,  the  FAS 
quality  ratings  are  superior  to  K3SP  data  for  statistical 
research . 

In  summary,  the  quality  and  quantity  ratings  could 
not  be  conclusively  validated  using  MS2P  results.  It  does 
apeear,  however,  that  vithout  more  frequent  y.SS?  inspections 


of  all  personnel,  the  performance  rating  distributions 
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provide  more  useable  and  representative  data  for  use  in 
statistical  research.  The  performance  ratings  at  least 
provide  face  validity. 

Maintenance  Officer  Opinions 

The  limited  number  of  maintenance  officer  opinions 
solicited  concerning  the  quantity  and  quality  of  performance 
ratings  produced  some  interesting  results.  Although  quantity 
and  quality  v/ere  considered  to  be  important  considerations, 
they  were  not  considered  to  be  independent  of  one  another. 

The  majority  of  officers  also  felt  that  quality  was  the  more 
important  factor  compared  to  quantity  in  all  but  the  most 
dangerous  national  emergencies.  This  is  surprising  since 
the  author  has  personally  witnessed  many  officers  pressuring 
technicians  to  do  repairs  rapidly.  For  this  reason  it  is 
felt  that  both  ratings  should  be  of  significant  value  to 
maintenance  managers  even  if  the  emphasis  continues  to  remain 
on  quality  performance. 

Summary 

The  purposes  of  this  study  were  to  provide  a  method 
for  evaluating  or  measuring  the  performance  of  Air  Force 
maintenance  technicians  and  to  provide  useful  and  valid 
performance  data  for  statistical  analysis.  3ased  on  a 
review  of  the  literature,  the  performance  evaluation  method 
developed  for  this  study  provided  a  measure  which  (1)  was 
understood  by  managers  and  supervisors,  (2)  was  applicable 


to  both  military  and  civilian  technicians,  (3)  was 


to  different  types  of  performance  tasks,  and  (-*■;  provi 
performance  measure  through  many  levels  of  weapons  sys 
maintenance.  The  statistical  properties  of  the  rating 
method,  however,  were  discovered  to  be  less  than  cor.;:  1 
ideal. 


The  test  sample  revealed  particular  statistical 
orcperties  which  were  not  expected  based  on  the  lit  era 
review;  particularly  surprising  were  the  skews- i  pua.nti 
cuality  of  performance  distributions.  These  stewed  di 
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parametric  tests  impossible  and  reduced  tr.e  tower 
parametric  tests  even  with  large  sample  sices,  even  c 
these  considerations,  however,  it  can  be  said  that  tr.e 


from  the  recommended  performance  rating  method  is  usea 
for  regression  analysis  and  doss  differentiate  between 
quality  and  quantity  of  performance. 

These  particular  conclusions  regarding  tr.e  test 
statistics  cannot  be  compared  to  the  results  of  tests 
other  rating  forms  since  such  results  do  not  appear  to 
widely  reported.  It  was  certainly  surprising  to  disc;; 
that  a  method  based  upon  previous  theory  and  research 
not  produce  the  symmetrical  distributions  other  author 
reoorted;  it  should  be  mentioned  that  most  reports  lid 
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include  data  on  tests  for  symmetry  or  equality  of  variances 
or  means  let  alone  the  sample  rating  distributions. 

Attempts  to  validate  the  quality  of  performance  data 
were  marginally  successful  for  one  squadron.  The  existing 
MSEP  data  used  for  the  comparisons  was  even  more  skewed  than 
the  performance  rating  distributions  and  omitted  some  2C %  of 
the  sample.  More  work  needs  to  be  done  to  determine  if  the 
MSEP  and  the  rating  distributions  were  skewed  as  a  result  of 
an  emphasis  on  technician  mask  compliance  or  simply  on  rater 
and  inspector  leniency.  At  any  rate,  the  data  does  not 
support  the  use  of  forced  normal  distributions  for  techni¬ 
cian  rankings  on  quantity  and  quality  of  performance. 

It  is  evident  that  although  the  present  method  has 
limitations  it  is  superior  to  existing  information  on 
individual  technician  performance.  The  potential  exists 
for  carefully  monitoring  the  quality  of  maintenance  techni¬ 
cian  performance  using  MSEP  data,  if  Air  Force  management 
feels  it  useful.  This  information  does  not  exist  now, 
however,  while  no  information  is  even  availaole  concerning 
the  quantity  of  performance.  Thus,  for  lack  of  any  superior 
system,  the  subjective  ratings  of  technician  quantity  and 
quality  of  performance  are  useful  and  acceptable  as  sources 
of  performance  statistics. 


Chapter  6 


CONCLUSION 

One  of  the  greatest  needs  of  managers  of  the  military 
weapons  system  maintenance  complex  is  to  measure  accurately 
how  well  individuals  perform  on  the  job.  Individual  job 
performance  is  one  of  the  bases  for  performance  by  the  entire 
organization.  If  the  effectiveness  of  weapons  system  main¬ 
tenance  is  to  be  improved,  then  individual  performance  must 
be  measurable  and  subject  to  improvement. 

Quantifying  job  effectiveness  is,  however,  difficult. 
Decades  of  research  by  psychologists  and  personnel  experts 
have  failed  to  provide  definitive  answers  to  the  question  of 
how  to  measure  performance  or  effectiveness.  The  main 
ourpose  of  this  study  was  to  find  or  develop  some  method  for 
evaluating  and  measuring  the  oerformance  of  aircraft  main¬ 
tenance  technicians  in  the  United  States  Air  Force.  This 
evaluation  method,  once  developed,  is  to  be  used  as  a 
performance  measure  of  manpower  effectiveness  in  another 
research  effort.  The  purpose  of  this  subsequent  research 
effort  will  be  to  develop  a  model  or  models  for  predicting 
or  evaluating  the  effectiveness  of  maintenance  technician 
performance  (see  Youns-,  1979:15  !• 

The  recommended  performance  appraisal  method 
(Appendices  3  and  C)  was  developed  through  a  review  of  the 
literature  on  the  subject.  The  literature  review  was 
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necessary  as  existing  appraisal  methods  such  as  APRs,  MSEP, 
or  3:'lTs  either  were  not  aoplicable  to  statistical  analysis, 
were  highly  inflated,  or  provided  incomplete  and  non-current 
coverage  of  the  organization.  The  method  which  was  developed 
relied  on  subjective  supervisor  appraisals  of  a  technician's 
quantity  and  quality  of  performance.  This  suggested  method 
has  face  validity,  if  previous  research  conclusions  are  to 
be  accepted. 

The  evaluation  of  the  performance  appraisal  method 
was  conducted  within  the  aircraft  maintenance  organization  of 
one  oilot  training  Air  Force  Base,  Williams  A?3,  Arizona. 

The  evaluation  at  one  base  limits  the  generality  of  the  test 
conclusions.  A  sample  selection  method  was  developed  that 
actually  paired  supervisors  and  their  subordinate  technicians 
and  provided  a  representative  portion  of  the  maintenance 
organization.  A  sample  size  of  20fo  of  the  organization 
provided  adequate  statistical  test  errors,  with  the  exception 
that  the  rating  scale  and  resulting  ratins  distribution 
increased  the  test  error.  A  change  in  the  rating  scale 
eliminating  numbered  gradations  in  quality  and  quantity  might 
eliminate  this  problem. 

The  test  evaluation  of  the  performance  appraisal 
method  also  resulted  in  skewed  quantity  and  quality  ratings, 
results  which  were  not  expected  based  on  the  literature 
review.  It  is  difficult  to  determine  from  this  present 
study  if  these  skewed  ratings  represented  an  emphasis  on 
technician  task  compliance  or  simply  on  rater  leniencey. 
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The  ratings  were  certainly  more  complete  and  less  skewed 
than  existing  Maintenance  Standardization  and  Evaluation 
( MS EP )  personnel  inspections.  The  skewed  ratings  do  intro¬ 
duce  some  restrictions  in  the  development  of  a  regression 
model  to  predict  or  evaluate  the  effectiveness  of  mainte¬ 
nance  technician  performance;  special  care  should  be  taken 
to  identify  and  correct  for  heteroscedasticity .  Quality  and 
quantity  ratings  were  differentiable  and  should  both  be  used 
to  represent  performance,  although  the  maintenance  officers 
surveyed  indicated  that  quality  of  performance  is  more 
essential  to  mission  accomciishnent  than  quantity  of 
performance . 

Attempts  to  validate  the  quality  of  performance  were 
marginally  successful  for  the  Operational  Maintenance 
Squadron  ;QM3)  involved  in  the  test.  The  attempts  to  validate 
quality  ratings  for  the  Field  Maintenance  Squadron  (FMS) 
were  unsuccessful.  H3SP  personnel  ins  section  data  was  used 
for  these  comparisons  and  proved  to  be  highly  inflated  and 
non-representative  of  the  organization.  No  data  existed  with 
which  to  compare  the  quantity  of  performance  ratings. 

Despite  these  difficulties,  the  performance  rating 
method  provides  useful  data  with  face  validity  which  can 
be  obtained  for  a  representative  segment  of  an  Air  Force 
maintenance  organization.  It  is  evident  that  the  rating 
data  must  be  used  with  care  in  attempting  to  develop  a 
model  of  organizational  effectiveness. 


> 


102 


Contributions  and 
Future  Considerations 

This  study  makes  several  contributions  to  the  field 
of  performance  appraisal  within  Air  Force  organizations.  The 
recommended  performance  rating  method  is  new  and  provides 
useful  information.  In  testing  the  performance  rating 
method,  a  sample  selection  technique  was  developed  that 
provided  input  from  supervisors  and  their  immediate  suoor- 
dinates  and  provided  a  representative  segment  of  the  mainte¬ 
nance  organization.  Many  previous  studies  failed  to  ensure 
that  suoervisors  were  actually  evaluating  their  subordinates, 
y.ost  significantly,  an  analysis  of  the  test  results  provided 
information  on  the  statistical  effects  of  skewed  distribu¬ 
tions,  of  ratine  scales  based  on  numbered  gradations  cf 
performance,  and  on  the  U3e  of  histograms  in  situations 
where  non-parametric  statistical  tests  are  required.  These 
contributions  were  offset,  however,  by  areas  which  were 
found  to  require  further  evaluation. 

A  superior  ratine  scale  might  be  suggested  from  the 
results  of  this  study.  Such  a  scale  would  have  only  end- 
and  mid-point  descriptions  (e.g.,  "slowest,"  "where  most 
perform,"  and  "fastest"),  no  numbered  gradations,  and  one 
scale  mark  at  the  mid-point.  Values  for  such  a  scale  would 
be  recorded  by  the  researcher, who  would  be  using  a  separate 
numbered  scale.  Such  a  scale  should  provide  symmetrical 
performance  distributions  with  small  variance  about  the 
means  and  correspondingly  low  Type  II  errors  in  comparative 
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statistical  tests.  Such  a  scale,  however,  would  have  low 
face  validity  and  might  not  be  acceptable  tc  supervisors  or 
maintenance  managers. 


Whatever  rating  method  is  used,  it  should  certainly  re¬ 
tested  at  more  locations  than  the  one  base  used  in  this  stud" , 
It  might  also  be  wise  to  include  second-level  supervisors' 
ratings  as  a  comparative  and  controlling  influence  or.  tne 
ratings  of  technician  performance  by  immediate  supervisors. 

It  is  unfortunate  that  existing  performance  appraisal 
information  cannot  be  used  to  evaluate  tecnnician  cur-' arm -re  * 
for  this  research  effort.  Zxisting  data  would  co  superior 
to  the  proposed  performance  ratings  in  t  or  ms  of  vuli  :L  vp  :  : 
acoeptaoility .  However,  Airmen  ferformar.ee  rcatir.rs 
and  Skill  Knowledge  Tests  iSZTs)  are  rot  appropriate. 


yj.~?  data,  on  the  o 
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hand,  could 

be  suite  us 

Force  recuiremer.ts 
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rc-1)  were  r 
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regular  inspections 
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individuals 

or  provided 

randomly  selecting  inspections  samoles.  As  it  is  use: 

K5EP  is  supposed  to  provide  trend  analysis  data  for  taint-.-- 
nance  sections  or  branches,  but  t.ois  data  is  base:  on  inval¬ 
id  sampling  procedures.  The  MSS?  lata  was  suspicious!;’ 
skewed  with  many  perfect  appraisals  and,  as  a  result,  war 
net  strongly  related  to  current  supervisor  ratings, 
data  could,  however,  provide  more  useful  information  : ,-vir 
Force  personnel  inspection  criteria  were  revised. 
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F i na 1  Comment 


It  is  evident  from  this  study  that  although  the 
recommended  subjective  performance  aopraisal  method  has 
limitations,  it  is  superior  to  existing  information  on 
individual  technician  performance.  The  appraisal  method  has 
face  validity  for  the  evaluation  of  aircraft  maintenance 
technicians  in  the  United  States  Air  Force.  It  also  provides 
useful  statistical  data.  Existing  information  on  tne  quality 
of  technician  performance  is  potentially  useful,  but  it 
fails  snort  of  being  representative  and  inclusive.  Little 

icr*'.  p v ° n  ccncsTr.  Lr'.^  u u ci n ^ i  t y  q**  *scr.  riicicin 


performance.  Thus ,  for  lac'.c  of  any  superior  system,  tne 
subjective  ratings  of  technician  quantify  and  quality  of 
performance  develored  in  this  study  are  useful  and  acceptable 
sources  of  performance  statistics. 


REFERENCES 


Barrett,  Richard  S.  Performance  Rating.  Chicago:  Science 
Research  Associates,  i960. 

Bayroff,  A.G.,  H.R.  Haggerty,  and  E.A,  Rundquist.  "Validity 
of  Ratings  as  Related  to  Rating  Techniques  and  Condi¬ 
tions."  Personnel  Psychology.  1 °54,2,  93-112. 

Bendig,  A.<V.  "Reliability  and  the  Number  of  Rating  Scale 
Categories."  Journal  of  Acolied  Psychology.  1954. 

J8,  33-40. 

Berk,  Kenneth  N.,  and  Ivor  S.  Francis.  "*  Reviev/  of  the 
Manuals  for  3MDP  and  SPSS."  Journal  of  the  American 
Statistical  Association,  1973.  3ol .  65-71. 

Beyer,  Jilliam  H.,  ed.  CRC  Standard  Mathematical  Tables. 

.Vest  Palm  Beach,  Fla.:  CRC  Press,  1073. 

Bittner,  R.  "Developing  an  Industrial  Merit  Rating  Proce¬ 
dure."  Personnel  Psychology ,  1^6,  _1,  *>03-432. 

Bradley,  James  V.  Distribution-Free  Statistical  Tests. 
Englewood  Cliffs,  N.J.:  Prentice-Hall,  Inc.,  1  Poo . 

3rogden,  H.E.,  and  E.K.  Taylor.  "The  Theory  and  Classifi¬ 
cation  of  Criterion  Bias."  Educational  and  Psychological 
Measurement .  1950.  10.  1 59-1 c6 . 

Callander,  Bruce.  "Hikes:  1  out  of  3  Makes  B-7."  air  Force 
Times .  July  2,  1^79,  p.4, 

Campbell,  J.P.,  and  others.  Managerial  Behavior,  Performance 
and  Effectiveness.  New  York:  McGraw-Hill,  1970. 

Campbell,  J.T.,  E.P.  Prien,  and  L.G.  Brailey.  "Predicting 
Performance  Evaluations."  Personnel  Psychology.  19bC, 

12,  435-440. 

Cummings,  L.L.,  and  D.P.  Schwab.  Performance  in  Organiza¬ 
tions  .  Glenview,  Ill.:  Scott,  Foresman,  Inc.,  1 973. 

Davies,  Owen  L.  The  Design  and  Analysis  of  Industrial 
Experiments.  London:  Oliver  and  Boyd,  195o. 

Dixon,  . J . ,  ed.  BMP?  3iomedical  Computer  Program.  Los 
Angeles:  University  of  California  Press,  19^7. 


ty-A* 


106 


Dudewicz, 

ity. 


Edward  J. 
New  York: 


Introduction  to  Statistics  and  Probabil- 
Holt,  Rinehart,  and  ./inston,  1976.  " 


Dunnette,  M.D.  "A  Note  on  the  Criterion. "  Journal  of 
Applied  Psychology.  1963.  91 .  251-259. 


Flanagan,  J.C.  "The  Critical  Incident  Technique.”  Psycho¬ 
logical  Bulletin.  1959,  ^1.  327-353. 

Foley,  John  P.  Evaluating  Maintenance  Performance:  An 
Analysis.  AFHRL-TR-?5-5?( 1 ) .  AD-053-475,  tfright- 
Patterson  AF3,  OH:  Advanced  Systems  Division,  Air  Force 
Human  Resources  Laboratory,  December  1979. 


Freund,  John  £.,  and  Irwin  Miller.  Probability  and  Statistics 
for  Engineers.  Englewood  Cliffs,  N.J.s  Frer. tice-.-iali  , 
Inc.,  1977. 


n 
. 2 


ibbons ,  Jean 
Andys  is . 


D . ,  Non-rarametric  Methods  dr  Quantitative 
N  ewiY  or  k :  Holt,  Rinehart,  and  -v  ins  ton,  T7  7  c  . 


laser,  R . ,  and 
Instruction 
Measurement 


A.J.  Nitko.  "Measurement  in  Learning  and 
”  In  R.L.  Thorndike  (Ed.),  Educational 
.Vashinaton:  American  Council  cn  Educat 


1971,  c25-o70. 

Guilford,  Joy  ?.  Psychometric  Methods.  New  Ycr 
Hill,  1959. 

Guion,  Robert  M.  Personnel  Testing.  .New  York: 
Hill,  n65. 

Habbe,  3.  "Marses  of  a  Good  Worker."  Management 
1956,  13,  163-170. 


McGraw- 

McCraw- 

H e cord , 


Hollander,  Myles,  and  Douglas  A.  .Volte.  Ncnoaranetric 

Statistical  Methods.  Nev;  York:  John  .Vile;/  and  Eons, 

1973. 


Hollingworth,  H.L.  Judging  Human  Character.  Nev/  Yor.<: 
Appleton,  1922. 

Holly,  W.H.,  H.S.  Feild,  and  N.J.  Barnett.  "Analyzing  Per¬ 
formance  Appraisal  Systems:  An  Empirical  Study." 
Personnel  Journal,  1976,  55 .  957-950. 

Jones,  L.V.,  and  L.L.  Thurstone.  "The  Psychophysics  of 

Semantics:  An  Experimental  Investigation."  Journal  of 
Applied  Psychology.  1955.  3° .  31-36. 

Earir. ,  Lawrence  L.  Statistics  for  Modern  Business  Decisions. 
New  York:  Harcourt,  5race,  Jovanovich,  Inc.,  1973. 


1 

| 


r 


107 


Lawler,  E.E.,  III.  "The  Multitrait-Multirater  Approach  to 
Measuring  Managerial  Job  Performance."  Journal  of 
Applied  Psychology,  1967,  51 .  369-381. 

Lewin,  Arie  Y.,  and  A.  Zwany.  "A  Model  Literature  Critique 
and  a  Paradigm  for  Research."  Personnel  Psycnology. 
1976,  29,  423-447. 

Locker,  Alan  H.,  and  K.3.  Teel.  "Performance  Appraisal — A 
Survey  of  Current  Practices."  Personnel  Journal.  1977, 
£6,  245-29-7. 

Lopez,  Felix  M.  Evaluating  Employee  Performance.  Chicago: 
Public  Personnel  Association,  1968. 

McDonnell,  James  A.  "Distaff  Mechanics  Doing  OK."  Air 
Force  Magazine,  19?9,  62,  7c-?Q. 

McGregor,  Douglas.  "An  Uneasy  Loo;:  at  Performance  Appraisal 
Harvard  Business  Review,  1  '<--7 ,  22  •  o'-°4. 

Meister,  D,,  D.L.  F ini ey ,  and  E..v.  Thompson.  Relation c r. i p 
Between  System  Design,  Technician  Training,  and  Main¬ 
tenance  Job  Performance  on  Two  Autopilot  Subsystems. 
AFHTL-TR -73-20.  3unker-Ramo  Corporat 
Patterson  AF3 ,  OH:  Advanced  Systems 
Human  Resources  Laboratory,  1971. 

Millard,  Cheedle  W.,  ?.  Luthans,  and  R.I. 

Breakthrough  for  Performance  .-tporaisa 
Horizons,  1>7 6,  !£,  66-??. 

Miner,  J.  "Management  by  Appraisal:  A  C 
Current  References."  Susiress  Horizo 
33-94. 

Muller,  Mervin  E.  "A  Review  of  the  Manuals  for  31  DP  and 

SPSS."  Journal  of  the  American  Statistical  Association, 
1-R73,  361  .  71 -BO. 

Meter,  John,  William  V/asserman,  and  G.A.  Whitmore.  Funda¬ 
mental  Statistics.  Boston:  Allyn  and  Bacon,  Inc.,  1973 

Meter,  John,  and  William  -Vasserman.  Applied  Linear  Statis¬ 
tical  Models.  Homewood,  Ill.:  Richard  D.  Irwin,  Inc., 
1^74. 

Qbradovic,  J.  "Modification  of  the  Forced  Choice  Method  as 
a  Criterion  of  Job  Proficiency."  Journal  of  Applied 
?sv cholcg-'.  1970,  44,  2.23-2  ;3. 


ion,  Wright- 
Division,  Air 


- 


i."  3usir.ess 


aosuie  Review  an: 
i c62  1  i 


ICS 


Honan,  W.'W.,  C.L.  Anderson,  and  T.L.  Talbert.  "A  Psycho¬ 
metric  Approach  to  Job  Performance:  Fire  Fighters." 
Public  Personnel  Management,  1°76,  40^-422. 

Rush,  C.H.  "A  Factorial  Study  of  Sales  Criteria."  Personnel 
Psychology.  1953.  6,  140-157. 

Sauer,  D.W.*  W.  3.  Campbell,  and  N.R.  Potter.  Human  Resource 
Factors  and  Performance  Relationships  in  Nuclear  Missile 
Handling  Tasks.  AFHRL-TR-76-35 ,  A?WL-TR-?b-30 .  -'/right - 
Patterson  A?3,0H:  Advanced  Systems  Division,  nir  Force 
Human  Resources  Laboratory,  1977. 


Snedecor,  George  tl . ,  and  William  G.  Cochran.  Statistical 
Methods.  Ames,  Iowa:  The  Iowa  State  College  Press, 
1956. 

Stevens,  S.H.,  and  E.F.  .'/or.derlic.  "An  Effective  Revision 
of  the  Rating  Technique."  Personnel  Journal.  1  ? . 

125-134. 


Swa  z  e  v ,  R  .  V . ,  and  R  .  3  . 
Referenced  Tests." 


Pearlstein. 


:  e  v  e  1 0 : 


.r.g  Criteria:. - 


; AS  Catalog  of  Selected  Document: 


Psychology.  1075,  F ,  227. 


Taylor,  E.K.,  R.S.  Barrett,  J.W.  Parser 
"Hating  Scale  Content:  II.  Effect 
ual  Scales."  Personnel  Psychology . 


and  L.  Farters . 
of  Hating  or.  I.ndivid- 


Taylor,  2.K. ,  and  Grace  i.  Kanson.  "Supervised  Ratings; 
Making  Granhic  Scales  Work."  Personnel,  195- .  . 

504-5ia. 


Thorton,  G.  "The  Relationship  Between  Supervisory  and  Self- 
Acoraisal  of  Executive  Performance."  Personnel  Psychol¬ 
ogy.  1963,  21,  441-455. 

Travers,  H.M.  "A  Critical  Review  of  the  Validity  and 

Rationale  of  the  Forced  Choice  Technique."  Psychol ogi cal 
Bulletin.  1951,  43,  62-'70. 


Uhrbrock,  R.S.  "2000  Scaled  Items."  Personnel  Psychology, 
1961,  14,  375-420. 


U.3.  Air  Force. 
Management . 
Office ,  n75 


Air  Force  Manual  [k?V.)  66-1,  Maintenance 
10  vols.  Washington :  Government  Printing 
and  1 °77 . 


Vanzelst,  R., 
Rating." 


and  tl.  Kerr.  "Worker  Attitude 
Personnel  Psychology,  1-53.  o. 


Toward  Merit 
1 59-1 72 . 


109 


V/hitla,  Dean  K.,  and  John  E.  Tirrell.  "The  Validity  of 
Ratings  of  Several  Levels  of  Supervisors.”  Personnel 
Psychology .  1953.  6,  461-466. 

'.Vikstrom,  W. 3.  Managing  by  and  with  Objectives.  National 
Industrial  Conference  Board,  Personal  Study  No.  212, 

1068. 

’.Viley,  L.N.  Task  Level  Performance  Criteria  Develooment. 

AFHRL-TR-77-7$,  AD -05 5 -69^".  Brooks  AF3  ,  TXs  Occupation- 
al  and  Manpower  Research  Division,  Air  Force  Human 
Resources  Laboratory,  December  1978. 

Young,  H.H.  Performance  Effectiveness  in  the  Air  Force 
Maintenance  System;  Preliminary  Report  and  Design 
Report,  bright-  Patterson  AFB,  OH:  Advanced  Systems 
Division,  Air  Force  Human  Resources  Laboratory,  1378, 
(Internal  Report) 


APPENDIX  A 

MAINTENANCE  TECHNICIAN  SURVEY 
PRIVACY  STATEMENT 


1: 


The  attached  survey  is  part  of  a  research  effort 
being  conducted  by  Arizona  State  University  under  contract 
with  the  Air  Force  Office  of  Scientific  Research,  and  with 
the  cooperation  of  the  Air  Force  Human  Resources  Labora¬ 
tory,  Advanced  Systems  Division,  WPAFB,  Ohio,  The  purpose 
of  the  survey  is  to  further  identify  factors  which  influence 
performance  effectiveness  in  maintaining  Air  Force  aircraft 
and  missile  systems. 

Your  participation  in  the  Survey  is  voluntary  but 
strongly  desired.  Your  responses  will  be  held  confidential 
and  in  no  way  will  impact  upon  your  career  nor  upon  the 
squadron  to  which  you  are  assigned.  Headquarters  USA? 

Survey  Control  Number  80-11  has  been  assigned  to  tnis 
survey, 

A.  Authority: 

(1)  5  U.S.C .  301,  Departmental  Regulations,-  ar.d 

(2)  10  U.S.C.  9012,  Secretary  oi  tne  Air  force. 
Powers,  Duties,  Delegation  dv  Compensation: 

(3)  POD  instruction  1100.13.  17  Apr  ob.  Survey-: 
of  Department  of  Defense  Personnel:  and/or 

(4)  AFR  30-23.  23  Sep  76.  Air  'Force  Personnel 
Survey  Program. 

B.  Principal  Purposes:  To  collect  information  from 
Air  Force  and  civilian  squadron  maintenance 
personnel  concerning  their  perceptions  of  factors 
which  influence  their  performance  effectiveness. 

To  initiate  the  development  of  an  Air  Force 
Maintenance  Performance  Effectiveness  Model  based 
on  the  survey  results  and  other  inputs. 

C.  Routine  Uses:  Data  will  be  used  for  research 
purposes  in  initiating  a  predictive  model  of 
maintenance  performance  effectiveness. 

D.  Participation  is  voluntary.  However,  your 
cooperation  is  requested. 

E.  No  adverse  action  of  any  kind  may  be  taken  against 
any  individual  who  elects  not  to  participate  in 
any  or  all  of  this  survey. 
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APPENDIX  D 


Tech. 

Quant . 

Qual. 

M3EP 

Raw  MSEP 

1 

”  8.0 

6,6 

.667 

1/3. 1/3 

2 

8.0 

9.0 

.888 

0/2, 0/3. 1/3 

3 

8.0 

10.0 

.611 

0/3. 1/3, 1/3, 1/3. 3/3. 1/3 

4 

7.0 

8.0 

5 

8.0 

6.0 

.563 

2/2,7/12,0/3.2/12 

6 

1.0 

2.0 

7 

5.0- 

8.0 

.333 

12/12,1/3 

8 

6.0 

7.0 

.333 

2/3. 2/3 

9 

5.0 

3.0 

.833 

0/3. 1/3 

10 

7.0 

9.0 

.667 

0/3. 2/3 

11 

9.0 

9.0 

.667 

0/3. 2/3 

12 

8.0 

8.0 

.883 

0/3. 0/3, 1/3 

13 

3.0 

8.0 

.523 

5/5.5/12,0/3 

14 

6.0 

8.0 

.611 

0/3. 0/3. 0/3. 2/3, 2/3 

15 

6.0 

6.0 

.417 

1/3.0/3.3/3.3/3 

16 

9.0 

9.0 

.500 

6/6, 0/3 

17 

2.0 

4.0 

.536 

0/3.7/12,0/3,0/3.2/3, 
3/3. 3/3 

18 

8.0 

8.0 

.000 

3/3 

19 

6.0 

6.0 

.777 

1/3. 1/3. 0/3 

20 

8.0 

8.0 

,444 

0/3. 2/3, 9/9 

21 

7.0 

7.0 

.381 

0/3. 1/3, 0/3. 0/3. 3/3. 
9/9, 1/3 

22 

2.0 

3.0 

.667 

1/3. 1/3 

23 

4.0 

4.0 

24 

9.0 

9.0 

.733 

0/3, 1/3. 2/3, 1/3 

25 

8.0 

9.0 

26 

7.0 

7.0 

.263 

6/11,2/3,3/3 

27 

9.0 

9.0 

.625 

3/4, 0/3 

28 

9.0 

9.0 

.750 

2/3, 0/3, 0/3, 3/9 

29 

7.0 

5.0 

.472 

0/3, 3/3. 1/3, 7/9 

30 

8.0 

9.0 

.333 

0/3. 0/3. 1/3, 2/3 

31 

9.0 

9.0 

.778 

0/3, 1/3, 1/3 

32 

5.0 

3.0 

.000 

1/1, 3/3 

33 

7.0 

8.0 

34 

7.0 

7.0 

.200 

3/5, 2/2 

35 

6.0 

6.0 

1.00 

0/3 

36 

5.0 

8.0 

1.00 

0/3 

37 

5.0 

7.0 

1.00 

0/3 

38 

6.0 

6.0 
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DATA 


OMS 

Tech. 

Quant . 

MSEP 

Raw  MSEP 

59 

4.0 

40 

7.0 

9.0 

.722 

5/9, 0/3 

41 

3.0 

2.0 

42 

10.0 

7.0 

1.00 

0/3. 0/3. 0/3 

43 

9.0 

8.0 

1.00 

0/3 

44 

8.0 

9.0 

.667 

0/3. 0/3. 3/3. 1/3 

^5 

7.0 

8.0 

46 

9.0 

9.0 

1.00 

0/3. 0/3 

47 

2.0 

3.0 

.689 

3/5. 0/3. 1/3 

48 

8.0 

7.0 

.555 

3/3. 0/3. 3/9 

49 

5.0 

5.0 

50 

1.0 

4.0 

1.00 

0/3 

51 

6.0 

7.0 

1.00 

0/3 

52 

5.0 

6.0 

53 

10.0 

10.0 

.861 

0/1,0/1,5/12 

54 

9.0 

10.0 

1.00 

0/3, 0/3 

55 

7.0 

9.0 

5b 

8.0 

9.0 

57 

8.0 

9.0 

58 

7.0 

8.0 

59 

9.0 

9.0 

.833 

0/3, 1/3 

60 

7.0 

8.0 

.667 

1/3. 1/3 

61 

7.0 

8.0 

.773 

5/11.0/3 

62 

5.0 

7.0 

.733 

0/3, 2/2, 0/3, 0/3. 3/9 

63 

7.0 

8.0 

1.00 

0/3 

64 

1.0 

3.0 

.500 

1/1. 0/3 

65 

3.0 

6.0 

.818 

0/3,6/11,0/3 

66 

7.0 

8.0 

.583 

0/3.12/12,0/3.2/3 

67 

6.0 

6.0 

.667 

1/3. 1/3 

68 

7.0 

9.0 

.333 

2/3, 2/3 

69 

7.0 

7.0 

1.00 

0/3 

70 

8.0 

9.0 

71 

6.0 

7.0 

1.00 

0/3. 0/3 

72 

8.0 

8.0 

1.00 

0/3, 0/3, 0/3, 0/3 

73 

6.0 

8.0 

1.00 

0/3, 0/3 

74 

7.0 

9.0 

1.00 

0/2, 0/3, 0/3, 0/3, 0/3 

75 

8.0 

8.0 

.542 

7/12,1/3 

76 

4.0 

6.0 

77 

8.0 

8.0 

.778 

1/3, 0/3, 1/3 
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DATA 

QMS 


Tech. 

Quant. 

Qual. 

MSEP 

78 

8.0 

9.0 

1.00 

79 

5.0 

6.0 

1.00 

80 

6.0 

6.0 

1.00 

FMS 

1 

9.0 

9.0 

.833 

2 

9.0 

9.0 

1.00 

3 

6.0 

9.0 

,778 

A 

8.0 

9.0 

1.00 

5 

3.0 

9.0 

1.00 

6 

1.0 

2.0 

.833 

7 

10.0 

10.0 

1.00 

8 

9.0 

9.0 

1.00 

0 

1.0 

1.0 

10 

6.0 

5.0 

1.00 

11 

8.0 

10.0 

.833 

12 

3.0 

3.0 

.166 

13 

6.0 

9.0 

1.00 

14 

5.0 

7.0 

15 

2.0 

4.0 

16 

3.0 

3.0 

17 

5.0 

7.0 

o 

o 

• 

18 

5.0 

6.0 

1 9 

5.0 

8.0 

1.00 

20 

6.0 

8.0 

.267 

21 

8.0 

9.0 

22 

7.0 

8.0 

1.00 

23 

7.0 

3.0 

1.00 

24 

7.0 

8.0 

.667 

25 

8.0 

9.0 

1.00 

26 

8.0 

8.0 

.888 

27 

6.0 

6.0 

28 

6.0 

7.0 

29 

1.0 

1.0 

30 

8.0 

8.0 

1.00 

31 

2.0 

2.0 

32 

6.0 

7.0 

33 

10.0 

9.0 

34 

8.0 

9.0 

.888 

35 

6.0 

8.0 

1.00 

Raw  MSSP 
0/3, 0/3, 0/3 
0/3 
0/3 


0/3, 1/3 
0/3, 0/3 
0/3, 0/3, 2/3 
0/3 

0/3. 0/3 

1/3. 0/3 

0/3 

0/3 

0/3, 0/3, 0/3 

0/3. 1/3 
2/3, 3/3 
0/2 


1/2. 1/2 

0/3 

3/3, 2/3, 3/3, 1/3, 2/3 


0/3, 0/3 
0/3 

0/3, 0/3. 3/3 

0/3, 0/3 
0/3, 0/3, 1/3 


0/3 


1/3, 0/3, 0/3 
0/3 
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DATA 

FMS 


Tech. 

Quant . 

.....  Qual. 

MSSP 

Raw  MSEP 

36 

6.0 

7.0 

37 

9.0 

9.0 

1.00 

0/3. 0/3 

38 

7.0 

8.0 

1.00 

0/3 

39 

9.0 

9.0 

1.00 

0/3. 0/3 

40 

9.0 

8.0 

1.00 

0/3. 0/2 

41 

8.0 

7.0 

.833 

0/3. 1/3 

42 

6.0 

4.0 

1.00 

o/3 

43 

5.0 

6.0 

1.00 

0/3. 0/3 

44 

4.0 

3.0 

45 

6.0 

5.0 

.833 

1/3. 0/2 

46 

7.0 

7.0 

1.00 

0/2 

47 

8.0 

10.0 

.750 

0/3. 0/3. 1/3, 2/3 

48 

6.0 

6.0 

1.00 

0/3 

40 

6.0 

10.0 

1.00 

0/2 

50 

4.0 

7.0 

1.00 

0/3. 0/3. 0/3 

51 

0.0 

8.0 

.833 

0/3, 1/3 

52 

8.0 

9.0 

.500 

0/3, 2/2 

53 

5.0 

6.0 

54 

6.0 

8.0 

h* 

• 

o 

o 

0/3 

55 

7.0 

8,0 

1.00 

0/3 

56 

9.0 

9.0 

1.00 

0/3 

57 

6.0 

6.0 

1.00 

0/3 

58 

5.0 

6,0 

1.00 

0/3 

59 

6.0 

7.0 

1.00 

0/3. 0/3 

60 

8.0 

9.0 

.833 

1/3, 0/3 

61 

9.0 

9.0 

.867 

0/3, 0/3, 0/3, 1/3, 1/3 

62 

7.0 

7.0 

1.00 

0/3 

63 

9.0 

10.0 

1.00 

0/2, 0/1 

64 

7.0 

7.0 

65 

9.0 

9.0 

1.00 

0/2 

66 

7.0 

8.0 

1.00 

0/3 

67 

5.0 

7.0 

1.00 

0/3 

68 

6.0 

8.0 

.833 

0/3, 1/3 

60 

9.0 

8.0 

1.00 

o/3 

70 

6.0 

5.0 

1.00 

o/3 

71 

8.0 

8.0 

.833 

0/3, 1/3 

72 

8.0 

9.0 

1.00 

o/3 

73 

7.0 

6.0 

1.00 

0/3 

74 

7.0 

7.0 

1.00 

0/3 

75 

9.0 

10.0 

1.00 

0/3, o/3 

117 


APPENDIX  D  (CONT.) 
DATA 

FMS 


77  4.0  5.0 

73  6.0  7.0  1.00  O/J 

79  9.0  9.0  1.00  0/3, 0/3 


80  9.0  8.0  1.00  0/3 

81  3.0  5.0 

82  8.0  9.0  1.00  0/3 

83  8.0  7.0  1.00  0/3 

84  5.0  6.0 

85  7.0  8.0  1.00  0/3 

86  5.0  9.0  1.00  0/3, 0/3 

87  8.0  7.0  1.00  0/3, 0/3, 0/3 

83  7.0  8.0 

8o  6.0  9.0  1.00  0/3 


Notes i  1.  The  Technician  numbers  that  appear  here  are  not 
the  same  as  the  code  numbers  used  in  the  actual 
experiment.  The  numbers  were  changed  to  protect 
the  identities  of  the  technicians. 

2.  The  Quantity  and  Quality  ratings  are  based  on  a 
10.0  scale. 

3.  The  Raw  MSEP  data  reflects  the  number  of 
discrepancies  found  by  inspectors  versus  the 
failure  baseline  for  the  particular  task  inspected. 

4.  The  MSEP  data  is  calculated  by  averaging  the 

Raw  MSEP  data  and  subtracting  this  value  from  one. 
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0K3  MS  HP  from  3KDP  (Dixoru  l'>77 )  Detailed  Data  Description  (F2D)  program. 
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APPENDIX  H 

F-Test  for  Equivalence  of  Sample  Variances 
FMS  Quantity  and  Quality  of  Perf ormancea 
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APPENDIX  I 

Comparison  of  Sample  Distributions 
FMS  Quantity  versus  Quality  of  Performance 


Kolmoeorov-Smirnov  non-parametric  two  sample  test.a,° 
Hq :  Fj_(x)=?2(x)  for  all  x. 

H]_ :  F^(x)<F2(x)  for  some  x. 

Risk:  P=.05c 
Critical  Region: 


Rej 

ect  H0  if 

D_ .  0  5^ 

[1.22  -fgT 

’  =  1 . 22 ( 

.1470)=  .132 

Cal 

culation 

O'0  P  . 

O  -K  . 

X 

G  -  (  x ) 

G 1  (  x ) 

8  <5  ( x ) 

3,  x ) 

1  —  p-  (  x ,  -3, 

1 

3 

2 

.03 

* 

•  XJC. 

2 

5 

u 

.06 

.05 

a 

> 

3 

7 

.07 

.03 

.01 

>, 

11 

33 

.12 

.10 

.  02 

T* 

0 

21 

14 

.24 

.16 

41 

23 

.46 

.26 

O  ^ 

7 

54 

30 

.61 

.  7-4 

.17 

Q 

■sj 

72 

61 

.81 

.67 

.12 

0 

3? 

33 

,°8 

.93 

10 

87 

8? 

1.00 

1.00 

•  vJ 

D_=  .20 

Conclusion:  as  D  >  D  n  ,  reject  the  null  hypothesis, 

-.U5 

conclude  that  F^<  F2. 


aSource:  Gibbons  ( 1976 j 2 52 ) . 


^Assume  that  the  quantity  and  quality  ratings  for  each 
technician  by  the  same  supervisor  are  independent 
samples  from  two  populations. 


'The  risk  (P)  value  is  based  on  the  assumption  that 
ouantity  and  quality  of  performance  are  continuous 
variables.  If  these  are  continuous  variables,  then 
the  P  value  should  be  considered  conservative.  (Gibbons: 
page  258 ) . 


APPENDIX  I  (CCfiT.) 


Comparison  of  Sample  Distributions 
FMS  Quantity  versus  Quality  of  Performance 


Behrens-Fisher  approximate  "t"  test,  (Dudewicz;197b:  3'-° 
for  the  case  where  variance^variance2 • 

H0 :  meani=mean2 


Hq  :  meani^mean2 


( si  /n.i  tsr/ni) 


v=dee:rees  of  freedom--  ft  “ft  n,  -1  }r(  si/nl  r/i  n  -1 


*175.76  or  175 


3.  3is<v=c<=  .05 

n.  Critical  Region: 

Reject  H  if!  t*j>  t  (<=*/2 ;  v) 


5.  Calculation  of  "t": 
t*S 

(sf/n^syn,)  1 


»  2.179 


Conclusion:  Reject  H„:  mean. /mean? 

V  W  k  k. 

Type  II  error:  Calculated  for  a  given  interval 
ence  assuming  equal  variance  (Freund, *1977:21 7  . 


d=  LizJUl  .  =  *231 

J*Uoi 


Using  the  tables  for  two-tailed  tests  (ox,  =  ,C5) 

from  Freund  ( 1 9^7 1 ^97 )  ,  the  Type  II  error  {($)  equals  . 


Welch' s  formulation  for  degrees  of  freedom  fcr  tr.e 
Behrens-Fisher  problem  ( Dudewicz ; 1976 : 31 1 ) . 
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APPENDIX  J 

Correlation  Analysis 
for  Quantity/Quality  Associations  and 
Quality/MSSP  Associations3 


1.  All  linear  correlation  coefficient  calculations  were 

made  using  the  3MDP  Bivariate  Plot  (PbD)  program  (Dixon; 
1977).  The  bivariate  plots  and  correlation  coeffice.nts 
are  presented  on  pages  128  and  129.  The  coefficients 
were  tested  in  the  following  manner: 


Hi :  p  /  0 


2.  Risk=  c*=  .01 

3.  Critical  Region:  Reject 

r  >  (r”(n-2)3 

4.  Calculation: 

r  =  _ 

5.  Conclusions:  See  pages  128  and  129. 

aRevised  KSZP  data  includes  only  personnel  with  two  or 
more  inspections. 
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(from  Snedecor ; 1 956 : 1 78 ) 
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APPENDIX  K  (CONT.) 
ANSWER  SHEET 


Is  individual  performance  important  to  the  maintenance 
organization? 


Are  the  ranking  forms  (Attachments  2  and 
or  aooraising  performance  cr  can  you  sun 
ppr  oa  on  ? 


Are  quantity  and  quality  useful  measures  of 


Which  do  you  consider  to  be  mere  important, 
iua  a  -  r,' 


If  one  is  more  important  tna.n  the  oth-r,  can  you  indie 
how  much  more  important  it  is? 
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APPENDIX  L 
SURVEY  ANSWERS 

t.  Is  individual  performance  important  to  the  maintenance 
organization? 

Answer  1 :  Definitelyl 

Answer  2*  Of  course.  Maintenance  production  is  the  sum 
of  individual  performances. 

Answer  3*  Without  doubt.  The  integrity  of  the  technician 
is  all  we  can  depend  on.  And  integrity  in  this  case 
translates  into  quality. 

Answer  4*  Yes. 

Answer  5*  Yes. 

2.  Are  the  ranking  forms  appropriate  for  appraising 
performance  or  can  you  suggest  a  better  approach? 

Answer  li  Yes. 

Answer  2i  They  may  be  useful,  but  are,  by  themselves, 
shallow.  In  many  jobs  the  technician  either  can  do  the 
job,  or  he  can’t  with  no  gradations  in  quality.  Quality 
in  trouble-shooting  may  not  necessarily  go  with  quality 
of  repair.  In  many  jobs,  particularly  in  avionics, 
poor  quality  will  probably  go  undetected.  Supervisors 
will  probably  be  influenced  by  a  strong  halo  effect. 

Answer  3«  They  are  O.K.  I  can’t  suggest  a  better  one 
offhand  but  I  would  be  interested  to  know  how  specific 
subordinates  "turned  out." 

Answer  4|  May  be  appropriate  for  research  —  oversimpli¬ 
fied  for  the  many  jobs  in  aircraft  maintenance. 

Answer  5«  MSEP  data  is  currently  used  for  this. 

3.  Are  quantity  and  quality  useful  measures  of  performance? 

Answer  1*  Definitely  important  inputs  in  the  total 
evaluation  of  the  individual. 

Answer  2i  (I  guess  I  started  answering  this  above.) 
Useful,  but  if  they  are  to  be  the  only  measures,  and  the 
rater  knows  they  are  the  only  measures,  they  will 
reflect  far  more  than  their  titles.  They  will  become 
APR’s. 


APPENDIX  L  v'CONT.) 
SURVEY  ANSWERS 
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3.  Are  quantity  and  quality  useful  measures  of  performance? 

Answer  3i  Yes. 

Answer  4i  Yes,  although  I  am  biased  toward  quality 
based  on  my  experience  in  Q.C.  Organization  measures 
of  performance  are  also  important. 

Answer  5*  Yes-measures  currently  in  use. 

j. 

U.  Which  do  you  consider  to  be  more  important,  quantity  or  ; 

quality?  ! 

Answer  1»  One  can  hardly  be  considered  without  the  other.  ? 

Everything  being  equal,  I  would  choose  quality  over 

quantity. 

Answer  2j  Quality  standards  must  be  met  ir.  any  .job,  I 

regardless  of  quantity  (speed).  It  is  more  complicated  ! 

than  that,  but  "quality"  comes  first. 


Answer  3i  Quality. 

Answer  Quality  (see  r;3 ) . 

Answer  5i  Quality  with  the  exception  of  wartime 
conditions — budget  is  also  a  factor. 

5.  If  one  is  more  important  than  the  other,  can  you  indicate 
how  much  more  important  it  is? 

Answer  1 i  Quality  is  more  important  only  to  the  degree 
that  without  quality  maintenance  the  mission  would  be 
jeopardized,  i.e.,  safety,  aborts,  out-of-commission 
rates,  etc..  Regardless  of  the  amount  of  output,  if 
it*s  not  reliable  the  quantity  would  do  little  for 
mission  accomplishment.  Quality  is  considerably  more 
important. 

Answer  2»  "Quality"  is  absolutely  overriding  in  import¬ 
ance,  BUT  "quality"  is  not  an  absolute.  For  example,  if 
a  perfectly  reliable  "temporary  fix"  saves  a  mission  no 
one  will  fault  the  loss  in  quality  over  a  lengthy 
permanent  repair.  The  same  component  in  the  shop  for 
overhaul  would  not  be  acceptable  with  the  temporary  fix. 
Another  example  of  the  nebulous  nature  of  quality  might 
be  corrosion  control.  The  Air  Force,  particularly  ,'jaC, 
wants  factory  new  paint  jobs.  The  technician  who  does 
a  by-the-book  perfect  job  of  inhibiting  corrosion  on  - 
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5.  prominent  panel  may,  by  technical  standards  have 

achieved  quality,  but  by  other  standards,  have  done  a 
poor  job. 

Answer  3*  Quality  is  by  far  more  important  in  most 
instances.  Quantity,  in  my  estimate,  is  more  important 
in  a  very  few  cases,  probably  in  wartime,  when  battle 
outcome  may  depend  on  speed.  In  such  cases  the  decrease 
in  quality  can  only  be  tolerated  in  some  areas,  and  is, 
or  should  be,  a  calculated  thing. 

Answer  *i  Quality  is  always  more  important  with  some 
possible  exceptions  during  wartime. 

Answer  5*  No  answer. 
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Glossary 


AF3.  (Air  Force  Base). 

AGE  (aerospace  ground  equipment;.  All  equipment  require:  or. 
the  ground  to  make  a  system  operational  in  its  intended 
environment. 

APR  (Airmen  Performance  Rating).  An  annual  or  semiannual 
aooraisal  of  Airmen. 


consistent  estimator.  An  estimator  is  a  consistent  estimate: 
of  a  parameter  if,  with  increasing  sample  sice,  the 
probability  that  the  value  of  the  statistic  is  very  mar 
that  of  the  parameter  becomes  closer  and  closer  tc 
am  tv . 


efficient  estimator.  An  estimator  is  a  more  efficient 

estimator  than  another  if  its  standard  error  is  smaller 
for  the  same  sample  sice. 

estimator.  A  statistic  obtained  from,  a  sample  tc  =s tim.at .  ; 
population  parameter.  For  instance,  the  sample  "ear.  is 
a  particularly  good  estimator  -'or  the  population  moan. 


?X3  (Field  Maintenance  Squadron). 


restore  :ime  :  : : 


;aoncaticn,  engine  and  aircratt  subsystem  repair,  am 


he teroscedasti ci ty .  The  case  where  regression  error  variant, 
is  not  constant  over  all  observations. 

histogram.  A  graphical  portrayal  of  a  population  fret  jerry 
distribution. 


kurtosis.  More  or  less  peaked  than  a  normal  distribution. 


MMIC3  (Maintenance  Management  Information  and 
System).  A  base  level  computer  system  les 
the  ef fectiveness  of  maintenance  orga.nioat 


or.  trol 

sued  t o  im.tr o v e 
ons . 


M35?  (Maintenance  Standardisation  and  ^valuation  iregran:. 

A  quantitative  quality  control  program  designed  to  che:.«c 
individual  technical  ccmteter.es  and  the  quality  of 
maintenance  through  evaluations  and  inspections. 

CM3  ( Organisational  Maintenance  Squadron;.  CMS  is  respcnsiole 
for  aircraft  launching  and  recovery  and  inspections. 


r 
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Glossary 


non-carametric  test.  A  test  which  maites  r.o  hypothesis  accut 
the  value  of  a  population  parameter. 

regression  analysis.  An  analysis  which  indicates  how  one 

variable  is  related  to  another.  It  provides  an  equation 
wherein  the  known  value  of  one  variable  may  be  used  to 
estimate  the  unknown  value  of  the  other.  If  is  distinct 
from  correlation  analysis,  which  indicates  the  degree  tc 
which  two  variables  are  related. 

skewed.  A  copulation  is  ste.vei  when  the  near:,  r.edi-r. ,  and 
m  ^  d  9  do  not  coincide  eni 


'  hi  ♦  r  i  n’  * 


'  r  .a  <">  u  or 


iarermc  of 


3  n  e  3  u  o  • 


(Dkill  hncwled re  Test!.  pacer  and  cer.cil  tec:  al¬ 
tered  to  soeeifioi  technician  soec: a^ties  trier  t: 


'.oticn  evaluations. 

onodci^il^  t*, 


?vce  I  error  ^©< )  . 
a  hyccthesis . 

Type  II  error  (/?;. 
a  hypothesis. 

unbiased  estimator, 
eoual  to  th- 


The  rrobacilify 


A  statistic  t.nat  has  : 
•ocuiation  parameter  be  in: 


s x r p c "pci  v'i 


•i  i 
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